Ratchets Capture Progress: Steps to Continuous Integration

A ratchet is a mechanism for locking in progress. Frequent builds and regression tests represent ratchets for development.

A Short History of a Well

Imagine a well: a fairly deep hole in the ground, with water at the bottom. Without a lot more digging, it’s too hard to get down to the water. So early on, two technologies came together: a bucket and a rope. You can lower the bucket down to the water, and pull it up. (I suspect it didn’t take long to realize that you probably want to tie the other end to a tree.)

Water is fairly heavy: a bit more than 8 pounds per gallon. (“A pint’s a pound the world around,” my mother taught me, as both are 16 ounces; but it’s a good approximation for weight too.) Some random internet site gives me the statistic: 243 gallons/day for an average US family of four. That’d be a lot of water to haul up by hand.

So another technology can help. Imagine a typical wishing well: one end of the rope is tied to a bar, and the bar is attached to a crank. By turning the crank, the rope wraps around the bar, and the bucket is pulled up. The crank gives us leverage: we pull up the bucket more slowly, but we can lift a heavier weight.

This is great, but after you’ve done it for a while you realize there’s another problem: if you get tired halfway through, you can’t really stop. Sure, you can stop cranking, but you have to hold the lever in place. I’m sure someone rigged up a rope to hold the crank handle in place. But someone asked, “What if we could make a wheel that only turned in one direction?”

A ratchet is such a device. It looks something like this:

The part at the top is called a pawl. The pawl can swing around (though one end is fixed). The teeth on the gear are tilted so that the pawl will slip over the sloped side when the crank is turned clockwise, but the pawl will hold the gear in place if it tries to go counter-clockwise. (To let the bucket back down, you knock the pawl out of the way.)

This is a great idea. It may take a lot of work to make progress, but once you do, the ratchet locks it in. If you get tired or distracted, it’s ok: you won’t lose what you’ve previously accomplished.

The Build Ratchet

It’s notoriously hard to assess progress in software development. (There’s an old saying, “It’s 90% done; now I just have to do the other 90%.”) One reason is that it’s hard to know that your code will work with that of other people. It’s very easy to get out of sync with what other people have done.

Frequent builds help with this problem. A build and smoke test helps guard against simple integration problems. It doesn’t catch everything, but does detect where things are so incompatible they don’t compile, or where the system compiled but fails the most obvious test.

The new mantra becomes “don’t break the build.” When the build is broken, it’s the team’s highest priority to fix it. Successful teams often treat this as “all hands on deck.” The fix may be as simple as reverting the last checkin or it may be a more complicated negotiation. The important thing is to not let things get any worse.

What does it ask of developers?

  • Don’t check in partial changes, unless it’s done in a way that doesn’t cause problems. (For example, adding a new class is probably not a problem. But if you change a call interface, you need to check in the updated callers as well.)
  • Check in daily (or more). If each developer keeps their files checked out all month, we aren’t resolving integration problems any sooner. (And if you check in before going home each day, you’ll have even fewer problems.)
  • Merge often. Pick up the changes from the mainline into your sandbox while you’re working. Before you check in, merge to the latest checked-in version.

Frequent builds (along with the discipline to fix the problems that arise) act as a ratchet: each integration problem is fixed, and the system grows.

The Test Suite Ratchet

A build and smoke test guards against simple integration problems–- what was last changed doesn’t break the build and passes a simple test. The next step is to maintain a full suite of tests (including system tests, regression tests, and so on). This suite should ideally test all areas of the application.

The new rule becomes “don’t regress.” The team regards any change that breaks an existing test as suspect, and makes it the highest priority to fix this. (As before, the fix may be as simple as “revert the last checkin.”)

There are basically four reasons why a test may fail:

  1. The new code has introduced (or re-introduced) a problem. This is the default assumption (guilty until proven innocent).
  2. There’s an environmental problem (e.g., web server needs re-start, software not installed, etc.) Fix the problem and try again.
  3. The test is wrong. It made an incorrect assertion, and the new code has revealed that problem.
  4. The test is fragile. It assumed something no longer true, and now that change is causing a problem.

This testing discipline asks something new of developers: don’t check in unless you’re sure you’re not causing a regression. The team may develop a subset of tests that can be run after a merge and before a checkin: these should be tests that tend to fail when something’s wrong.

The New Tests Ratchet

Teams develop their own consensus about what it means to check something in. A further ratchet is to say, “Only check in working code, demonstrated by new automated tests.” (Note that this has two parts, and requires automated tests.) This extends the test suite with a commitment to adding new tests.

Without something like this rule, you have a lag: features get added at one time, and the tests make it in later. But very often, the tests reveal problems in the feature, so what you thought was done turns out not to be. Our goal is to know the true progress; we’re better off if we know it sooner than later.

How Far?

These techniques aren’t particularly new. Daily build and smoke test has been popularly described for more than ten years, and was in use before that. Extreme Programming pushes these habits far, in the form of testing and continuous integration: the customer develops tests for each story, each pair of programmers writes tests before they develop the corresponding code, everybody integrates and checks in one or more times a day, and each time they check in they build the system and run all (or almost all) the tests.

It’s hard to measure progress. But these techniques let you better trust that when something is written and checked in, it represents true forward motion.

[Written April, 2004, by William C. Wake.]