Tag Archives: tdd

3A – Arrange, Act, Assert

Some unit tests are focused, other are like a run-on sentence. How can we create tests that are focused and communicate well?

What's a good structure for a unit test?

3A: Arrange, Act, Assert

We want to test the behavior of objects. One good approach is to put an object into each "interesting" configuration it has, and try various actions on it. 

Consider the various types of behaviors an object has:

  • Constructors
  • Mutators, also known as modifiers or commands
  • Accessors, also known as queries
  • Iterators

I learned this separation a long time ago but I don't know the source (though my guess would be some Abstract Data Type research). It's embodied in Bertrand Meyer's "command-query separation" principle, and others have independently invented it.

With those distinctions in mind, we can create tests:

Arrange: Set up the object to be tested. We may need to surround the object with collaborators. For testing purposes, those collaborators might be test objects (mocks, fakes, etc.) or the real thing.

Act: Act on the object (through some mutator). You may need to give it parameters (again, possibly test objects).

Assert: Make claims about the object, its collaborators, its parameters, and possibly (rarely!!) global state. 

Where to Begin?

You might think that the Arrange is the natural thing to write first, since it comes first.

When I'm systematically working through an object's behaviors, I may write the Act line first. 

But a useful technique I learned from Jim Newkirk is that writing the Assert first is a great place to start. When you have a new behavior you know you want to test, Assert First lets you start by asking "Suppose it worked; how would I be able to tell?" With the Assert in place, you can do what Industrial Logic calls "Frame First" and lean on the IDE to "fill in the blanks." 


Aren't some things easier to test with a sequence of actions and assertions?

Occasionally a sequence is needed, but the 3A pattern is partly a reaction to large tests that look like this:

  • Arrange
  • Act
  • Assert
  • Act
  • Assert
  • Arrange more
  • Act
  • Assert

To understand a test like that, you have to track state over a series of activities. It's hard to see what object is the focus of the test, and it's hard to see that you've covered each interesting case. Such multi-step unit tests are usually better off being split into several tests.

But I won't say "never do it"; there could be some case where the goal is to track a cumulative state and it's just easier to understand in one series of calls. 

Sometimes we want to make sure of our setup. Is it OK to have an extra assert?

Such a test looks like this:

  • Arrange
  • Assert that the setup is OK
  • Act
  • Assert that the behavior is right

First, consider whether this should be two separate tests, or whether setup is too complicated (if we can't trust objects to be in the initial state we want). Still, if it seems necessary to do this checking, it's worth bending the guideline.

What about the notion of having "one assert per test"?

I don't follow that guideline too closely. I consider it for two things: 

  1. A series of assertions may indicate the object is missing functionality which should be added (and tested). The classical case is equals()It's better to define an equals() method than (possibly create and) repeat a bunch of assertions about held data.
  2. A series of similar assertions might benefit from a helper (assertion) method.

(If an object has many accessors, it may indicate the object is doing too much.)

When a test modifies an object, I typically find it easiest to consider most accessors together. 

For example, consider a list that tracks the number of objects and the maximum entry. One test might look like this:

    List list = new List();
    assertEquals(1, list.size());
    assertEquals(3, list.max());

That is, it considers the case "what all happens when one item is inserted into an empty list?" Then the various assertions each explore a different "dimension" of the object.

What about setup?

Most xUnit frameworks let you define a method that is called before each test. This lets you pull out some common code for the tests, and it is part of the initial Arrange. (Thus you have to look in two places to understand the full Arrange-ment.)

What about teardown?

Most xUnit frameworks let you define a method that is called after each test. For example, if a test opens a file connection, the teardown could close that connection.

If you need teardown, use it, of course. But I'm not adding a fourth A to the pattern: most unit tests don't need teardown. Unit tests (for the bulk of the system) don't talk to external systems, databases, files, etc., and Arrange-Act-Assert is a pattern for unit tests. 


I (Bill Wake) observed and named the pattern in 2001. "Arrange-Act-Assert" has been the full name the whole time, but it's been variously abbreviated as AAA or 3A. Kent Beck mentions this pattern in his book Test-Driven Development: By Example (p. 97). This article was written in 2011. Added a description of Assert First and Frame First due to Brian Marick's comment. [4/26/11] 

Tests from a Hat

How does the order of tests affect the design of software?


  • Write a bunch of plausible unit test descriptions.
  • Slice them into strips and put them in a hat.
  • Repeat until the hat is empty:
    • Pull tests one at a time from a hat.
    • Write the indicated test
    • Write the code to make the last test pass, TDD style.
    • If you absolutely must create a prerequisite test, do so.
    • Refactor as you go.
  • Set aside your implementation, shuffle the tests back into the hat, and repeat once or twice.

Exercise ideas:

  • spreadsheet (domain objects, parsing, representation of storage),
  • auction sniper (from Freeman and Pryce’s Growing Object-Oriented Software),
  • video rental store (from Fowler et al’s Refactoring),
  • currency (from Beck’s Test-Driven Development),

Sample Tests for a Spreadsheet:

  1. New sheet is empty
  2. Inserting into a sheet stores a value
  3. Numeric cells are formatted as numbers
  4. Cells retain literal values (for editing) Ex: 007 is edited as 007 but displays as 7.
  5. Cells starting with = are formulas.
  6. Constant formula Ex. =7
  7. Parentheses in formulas
  8. Multiplication in formulas
  9. Addition in formulas
  10. Precedence in formulas

Reflection Points:

  • Did you encode your tests the same each time?
  • Did the solutions come out basically the same? Why or why not?
  • Did you see alternative solution paths when you went through again?

Bill Wake, 2010. Inspired by "Scenes From a Hat" in improv.

Review – Growing Object-Oriented Software, Guided by Tests

Growing Object-Oriented Software, Guided by Tests, by Steve Freeman and Nat Pryce, ISBN 0-321-50362-7

Freeman and Pryce explain Test-Driven Development through an extended example. (They have a somewhat different perspective than I do, with much heavier use of mock objects. I'm not sure what drives this difference; it may be due to type of application, philosophical reasons, or just something I should learn.) I love the clarity with which they tackle the problem of driving in from all the way outside, and how they don't shy away from dealing with concurrency and persistence. It's high praise for me to say a book deserves further study; this book is one I'll definitely be reading again. 


Sudoku Solver

Sudoku is a fairly well-known type of puzzle. Solving it turns out to be easier than I expected, but a somewhat odd example of test-driven development.

There's been discussion on the extremeprogramming group about solving Sudoku puzzles.  I saw Ron Jeffries first do a spike, and then start solving. Then Cory Foy pitched in his version as well. I skipped to the bottom of his article and found out it wasn't doing an exhaustive search, but rather looking for simple "forced" cells.

From the discussion, I had decided I wouldn't be fancy on the indexing: I'll just use a simple array, and define constraints as indexes on the array. I also decided to first do an exhaustive depth-first search, and then performance-tune if necessary.

First Test, Unusually Large

Here's my first test:

  public void testBoardThatHasToApplyConstraints() {
  Board board = new Board(new int[] {
    1, 2,
    0, 1 });
  ArrayList constraints = new ArrayList();
  constraints.add(new Constraint(new int[] { 0, 1 }));
  constraints.add(new Constraint(new int[] { 2, 3 }));
  constraints.add(new Constraint(new int[] { 0, 2 }));
  constraints.add(new Constraint(new int[] { 1, 3 }));
  Board result = new Solver().solve(board, constraints);
  Board expected = new Board(new int[] {
    1, 2,
    2, 1 });
  assertEquals(expected, result);

This isn't a real Sudoku puzzle but it's close enough to get me started.

I don't often start with a test this big but I wanted to capture the direction I plan to go. I used this chance to create stubs for the Board, Constraint, and Solver classes and their methods:

public class Board {
    public Board(int[] cells) {}


public class Constraint {
    public Constraint(int[] indexes) {}


public class Solver {
    public Board solve(Board board, ArrayList constraints) {
         return null;

As you'd expect, this test comes back red. It's definitely too advanced for where we are. So I'm going to rename the failing test method to xtestBoardThatHasToApplyConstraints(), (so it won't run) and take a smaller bite, what Kent Beck calls a Child Test.

Board Equality

My smaller step will be to test whether two boards are equal. It's much easier, and I have to do it anyways to make the final "assertEquals()" work, so it's a good step. I implemented the next series of tests one at a time, but I'll show them all together:

public class BoardTest extends TestCase {
    Board board;

    protected void setUp() {
        board = new Board(new int[] { 1, 2, 0, 1 });

    public void testBoardNotEqualToNull() {

    public void testBoardNotEqualToNonBoardObject() {
        assertFalse(board.equals(new Object()));

    public void testBoardsOfDifferentSizesArentEqual() {
        Board board2 = new Board(new int[] { 1, 2, 3 });

    public void testBoardsWithDifferentContentsArentEqual() {
        Board board2 = new Board(new int[] { 1, 2, 1, 1 });

    public void testBoardsWithSameContentsAreEqual() {
        Board board2 = new Board(new int[] { 1, 2, 0, 1 });
        assertEquals(board, board2);
        assertEquals(board2, board);

    public void testEqualityIsTransitive() {
        Board board2 = new Board(new int[] { 1, 2, 0, 1 });
        Board board3 = new Board(new int[] { 1, 2, 0, 1 });

        assertEquals(board, board2);
        assertEquals(board2, board3);
        assertEquals(board, board3);

The implementation is what you'd expect:

public class Board {
    int[] cells;

    public Board(int[] cells) {
        this.cells = cells;

    public boolean equals(Object obj) {
        if (obj == null) return false;
        if (obj.getClass() != Board.class) return false;
        Board that = (Board) obj;
        if (this.cells.length != that.cells.length) return false;
        for (int i = 0; i < cells.length; i++) 
            if (this.cells[i] != that.cells[i])
                return false;
        return true;

A Small, Full Grid

Rather than tackle that large test I started with, I have a much simpler case. Let's look at a board with everything filled in, under no constraints.

public void testSolvedBoardComesBackAsIsWithNoConstraints() {
    Board board = new Board(new int[] { 1, 2, 2, 1 });
    Board result = new Solver().solve(board, new ArrayList());
    assertEquals(board, result);

A straightforward solution makes that work:

public Board solve(Board board, ArrayList constraints) {
    if (solved(board, constraints))
        return board;
    throw new Error("No solution found");

private boolean solved(Board board, ArrayList constraints) {
    return true;

This next test is the same, but with constraints added. Since we always return the board we were given, we expect it to pass (and it does). A Constraint is a set of "n" positions; the constraint it enforces is that each number 1..n must appear exactly once in the cells with those positions.

public void testSolvedBoardComesBackAsIsWithConstraints() {
    Board board = new Board(new int[] { 1, 2, 2, 1 });
    ArrayList constraints = new ArrayList();
    constraints.add(new Constraint(new int[] { 0, 1 }));
    constraints.add(new Constraint(new int[] { 2, 3 }));
    constraints.add(new Constraint(new int[] { 0, 2 }));
    constraints.add(new Constraint(new int[] { 1, 3 }));

    Board result = new Solver().solve(board, constraints);

    assertEquals(board, result);

I'll do a bit of "Fake It" here. I want to say the board is solved if no constraint says it isn't. I'll run through the constraints, but for now they'll all answer "true" (i.e., that the puzzle is solved as far as that constraint is concerned). The tests stay green

// In Board.java:
private boolean solved(Board board, ArrayList constraints) {
    for (int i = 0; i < constraints.size(); i++) {
        Constraint constraint = (Constraint) constraints.get(i);
        if (!constraint.solvedBy(board)) return false;
    return true;
// In Constraint.java:
public boolean solvedBy(Board board) {
    return true;

To show that Constraint isn't picky enough, let's add a test:

public void testConstraintIsNotSolvedIfAnyIntegerIsMissing() {
    Board board = new Board(new int[] { 1, 2, 0, 1 });
    Constraint constraint = new Constraint(new int[] { 2, 3 });

Constraint makes a list of values used and returns false if any (digits 1 or greater) is not exactly 1.

// In Constraint.java:
public boolean solvedBy(Board board) {
    int[] valuesUsed = new int[board.width()+1];
    for (int i = 0; i < indexes.length; i++) 
    for (int i = 1; i < valuesUsed.length; i++) 
        if (valuesUsed[i] != 1)
            return false;
    return true;
// In Board.java:
public class Board {
    int[] cells;
    int width;

    public Board(int[] cells) {
        this.cells = cells;
        width = (int) Math.sqrt(cells.length);

    public int width() {
        return width;
    public int at(int i) {
        return cells[i];

    // etc.


We have the notion of whether a puzzle is solved (all constraints say they are solved). But there's another notion we need along the way: whether a board configuration is valid. A board is valid if it has no constraint that says it's not. And the constraints will allow 0s (empty cells), but not two cells that claim to own the same number. So: empty cells in a row is valid, but two 7s in a row are not.

public void testBoardIsValidIfNoConstraintsAreContradicted() {
    Board board = new Board(new int[] { 1, 2, 0, 1 });
    ArrayList constraints = new ArrayList();
    constraints.add(new Constraint(new int[] { 2, 3 }));
    assertTrue(new Solver().valid(board, constraints));

We can make valid() return true and get to green. It needs to do more when we test the other way:

public void testBoardIsInvalidIfAnyConstraintIsContradicted() {
    Board board = new Board(new int[] { 1, 2, 1, 1 });
    ArrayList constraints = new ArrayList();
    constraints.add(new Constraint(new int[] { 2, 3 }));

    assertFalse(new Solver().valid(board, constraints));


// In Solver.java: 
public boolean valid(Board board, ArrayList constraints) {
    for (int i = 0; i < constraints.size(); i++) {
        Constraint constraint = (Constraint) constraints.get(i);
        if (!constraint.accepts(board)) return false;
    return true;
// In Constraint.java:

public boolean accepts(Board board) {
    int[] valuesUsed = new int[board.width()+1];
    for (int i = 0; i < indexes.length; i++) 
    for (int i = 1; i < valuesUsed.length; i++) 
        if (valuesUsed[i] > 1)
            return false;
    return true;

If you look back at the earlier Constraint.solvedBy() method, you'll see that the first half of these two methods is identical. We'll extract a helper method:

public boolean solvedBy(Board board) {
    int[] valuesUsed = valuesUsed(board);
    for (int i = 1; i < valuesUsed.length; i++) 
        if (valuesUsed[i] != 1)
            return false;
    return true;

public boolean accepts(Board board) {
    int[] valuesUsed = valuesUsed(board);
    for (int i = 1; i < valuesUsed.length; i++) 
        if (valuesUsed[i] > 1)
            return false;
    return true;

private int[] valuesUsed(Board board) {
    int[] valuesUsed = new int[board.width()+1];
    for (int i = 0; i < indexes.length; i++) 
    return valuesUsed;

There's still duplication – those loops are very similar – but we'll leave it. In Ruby or Smalltalk, it might be easier to get rid of.

Depth-First Search

We're up to the edge of solving real Sudoku. When I do them by hand, I have a handful of strategies (look for forced cells, consider both down and across, etc.) But at least for my skill level, the hard problems have always required me to do a search: try it one way and see what happens, and backtrack if that fails.

My worst-case strategy is a standard strategy from AI: depth-first search. One algorithm for that says:

  • Make a stack and push the initial candidates.
  • While there are still things in the stack:
    • Pop the top item.
    • If it's a solution, return it.
    • If it's not a solution but it's valid, generate any consequent candidates and push them on the stack.
    • If it's neither a solution nor valid, don't push anything
  • If you have an empty stack at the end, you found no solution.

To represent my candidates, I'll have a Candidate class. It's just a data bag with three things we need to track: the board, the next index to change, and the value to put there. It's one of those rare classes so simple I won't write a test for it.

public class Candidate {
    private Board board;
    private int index;
    private int value;
    public Candidate(Board board, int index, int value) {
        this.board = board;
        this.index = index;
        this.value = value;

    public Board board() {
        return board;

    public int index() {
        return index;
    public int value() {
        return value;

In a search like this, we'll frequently take a board, and want to modify one cell. If we work on a copy of the board, backtracking is easy – just don't use that copy any more. 

public void testBoardCanBeBuiltWithOneValueChanged() {
    Board board2 = new Board(board, 0, 2);
    assertEquals(2, board2.at(0));
    assertEquals(new Board(new int[] { 2, 2, 0, 1 }), board2);

Create that new constructor, copying the cells and the width. (I forgot to copy the width on my first try:(

public Board(Board old, int index, int value) {
    this.cells = new int[old.cells.length];
    for (int i = 0; i < cells.length; i++)
        cells[i] = old.cells[i];
    width = old.width;

    cells[index] = value;

The next test can force the depth-first algorithm.

public void testBoardThatHasToApplyConstraints() {
    Board board = new Board(new int[] { 1, 2, 0, 1 });
    ArrayList constraints = new ArrayList();
    constraints.add(new Constraint(new int[] { 0, 1 }));
    constraints.add(new Constraint(new int[] { 2, 3 }));
    constraints.add(new Constraint(new int[] { 0, 2 }));
    constraints.add(new Constraint(new int[] { 1, 3 }));

    Board result = new Solver().solve(board, constraints);

    Board expected = new Board(new int[] { 1, 2, 2, 1 });
    assertEquals(expected, result);

As I go through this exposition, this just seems like too big a step. Let's try a smaller step that checks the generation of candidates.

public void testIfIndexIsTooBigThenNothingGetsPushed() {
    Board board = new Board(new int[] {1, 2, 0, 1});
    Stack stack = new Stack();
    new Solver().pushCandidates(stack, board, 4);

public void testIfCellAlreadyFilledOnlyThatBoardGetsPushed() {
    Board board = new Board(new int[] {1, 2, 0, 1});
    Stack stack = new Stack();
    new Solver().pushCandidates(stack, board, 0);
    assertEquals(1, stack.size());
    assertEquals(new Candidate(board, 0, 1), stack.peek());

public void testIfCellIsEmptyThenPushAllCandidates() {
    Board board = new Board(new int[] {1, 2, 0, 1});
    Stack stack = new Stack();
    new Solver().pushCandidates(stack, board, 2);
    assertEquals(2, stack.size());
    assertEquals(new Candidate(board, 2, 1), stack.elementAt(0));
    assertEquals(new Candidate(board, 2, 2), stack.elementAt(1));

This code makes those tests pass:

// In Solver.java:
public void pushCandidates(Stack stack, Board board, int index) {
    if (index >= board.width() * board.width()) return;

    if (board.has(index)) {
        stack.push(new Candidate(board, index, board.at(index)));

    for (int i = 1; i <= board.width(); i++)
        stack.push(new Candidate(board, index, i));
// In Candidate.java:
public boolean equals(Object obj) {
    if (obj == null || obj.getClass() != Candidate.class) return false;
    Candidate that = (Candidate) obj;
    return this.board.equals(that.board) 
        && this.index == that.index 
        && this.value == that.value;

// In Board.java:
public boolean has(int index) {
    return cells[index] != 0;

Let's return to testBoardThatHasToApplyConstraints(). Now I'll take a go at the search algorithm. Confession – I accidentally passed board (rather than newBoard) to pushCandidates(), and used the debugger to see I had the wrong board.

public Board solve(Board board, ArrayList constraints) {
    Stack stack = new Stack();
    pushCandidates(stack, board, 0);

    while (!stack.isEmpty()) {
        Candidate candidate = (Candidate) stack.pop();
        Board newBoard = new Board(candidate.board(), candidate.index(), candidate.value());

        if (solved(newBoard, constraints)) 
            return newBoard;
        if (valid(newBoard, constraints))
            pushCandidates(stack, newBoard, candidate.index() + 1);

    throw new Error("No solution found");

The  next two tests pass already. One's a small board, the other is the example from the Wikipedia article cited above. (I also tried a couple "hard" ones from the back of a Sudoku book, but I won't include them here.)

public void test4x4Board() {
    Board board = new Board(new int[] { 
            1,2, 0,0,
            0,0, 1,2,
            0,1, 0,4,
            4,0, 0,0
    ArrayList constraints = new ArrayList();
    // Row constraints
    constraints.add(new Constraint(new int[] {  0,  1,  2,  3 }));
    constraints.add(new Constraint(new int[] {  4,  5,  6,  7 }));
    constraints.add(new Constraint(new int[] {  8,  9, 10, 11 }));
    constraints.add(new Constraint(new int[] { 12, 13, 14, 15 }));

    // Column constraints
    constraints.add(new Constraint(new int[] {  0,  4,  8, 12 }));
    constraints.add(new Constraint(new int[] {  1,  5,  9, 13 }));
    constraints.add(new Constraint(new int[] {  2,  6, 10, 14 }));  
    constraints.add(new Constraint(new int[] {  3,  7, 11, 15 }));
    // Box constraints
    constraints.add(new Constraint(new int[] {  0,  1,  4,  5 })); 
    constraints.add(new Constraint(new int[] {  2,  3,  6,  7 }));  
    constraints.add(new Constraint(new int[] {  8,  9, 12, 13 }));  
    constraints.add(new Constraint(new int[] { 10, 11, 14, 15 })); 
    Board result = new Solver().solve(board, constraints);

    Board expected = new Board(new int[] { 
            1,2, 4,3,
            3,4, 1,2,
            2,1, 3,4,
            4,3, 2,1
    assertEquals(expected, result); 
public void testBigBoardWiki() {
    Board board = new Board(new int[] {
            5,3,0, 0,7,0, 0,0,0,
            6,0,0, 1,9,5, 0,0,0,
            0,9,8, 0,0,0, 0,6,0,
            8,0,0, 0,6,0, 0,0,3,
            4,0,0, 8,0,3, 0,0,1,
            7,0,0, 0,2,0, 0,0,6,
            0,6,0, 0,0,0, 2,8,0,
            0,0,0, 4,1,9, 0,0,5,
            0,0,0, 0,8,0, 0,7,9

    ArrayList constraints = constraints9x9();

    Board result = new Solver().solve(board, constraints);

    Board expected = new Board(new int[] {
           5,3,4, 6,7,8, 9,1,2,
           6,7,2, 1,9,5, 3,4,8,
           1,9,8, 3,4,2, 5,6,7,

           8,5,9, 7,6,1, 4,2,3,
           4,2,6, 8,5,3, 7,9,1,
           7,1,3, 9,2,4, 8,5,6,

           9,6,1, 5,3,7, 2,8,4,
           2,8,7, 4,1,9, 6,3,5,
           3,4,5, 2,8,6, 1,7,9
    assertEquals(expected, result);      
private ArrayList constraints9x9() {
    ArrayList constraints = new ArrayList();

    // Row constraints
    constraints.add(new Constraint(new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8 }));
    constraints.add(new Constraint(new int[] { 9, 10, 11, 12, 13, 14, 15, 16, 17 }));
    constraints.add(new Constraint(new int[] { 18, 19, 20, 21, 22, 23, 24, 25, 26 }));
    constraints.add(new Constraint(new int[] { 27, 28, 29, 30, 31, 32, 33, 34, 35 }));
    constraints.add(new Constraint(new int[] { 36, 37, 38, 39, 40, 41, 42, 43, 44 }));
    constraints.add(new Constraint(new int[] { 45, 46, 47, 48, 49, 50, 51, 52, 53 }));
    constraints.add(new Constraint(new int[] { 54, 55, 56, 57, 58, 59, 60, 61, 62 }));
    constraints.add(new Constraint(new int[] { 63, 64, 65, 66, 67, 68, 69, 70, 71 }));
    constraints.add(new Constraint(new int[] { 72, 73, 74, 75, 76, 77, 78, 79, 80 }));

    // Column constraints
    constraints.add(new Constraint(new int[] { 0, 9, 18, 27, 36, 45, 54, 63, 72 }));
    constraints.add(new Constraint(new int[] { 1, 10, 19, 28, 37, 46, 55, 64, 73 }));
    constraints.add(new Constraint(new int[] { 2, 11, 20, 29, 38, 47, 56, 65, 74 }));
    constraints.add(new Constraint(new int[] { 3, 12, 21, 30, 39, 48, 57, 66, 75 }));
    constraints.add(new Constraint(new int[] { 4, 13, 22, 31, 40, 49, 58, 67, 76 }));
    constraints.add(new Constraint(new int[] { 5, 14, 23, 32, 41, 50, 59, 68, 77 }));
    constraints.add(new Constraint(new int[] { 6, 15, 24, 33, 42, 51, 60, 69, 78 }));
    constraints.add(new Constraint(new int[] { 7, 16, 25, 34, 43, 52, 61, 70, 79 }));
    constraints.add(new Constraint(new int[] { 8, 17, 26, 35, 44, 53, 62, 71, 80 }));

    // Box constraints
    constraints.add(new Constraint(new int[] { 0, 1, 2, 9, 10, 11, 18, 19, 20 }));
    constraints.add(new Constraint(new int[] { 3, 4, 5, 12, 13, 14, 21, 22, 23 }));
    constraints.add(new Constraint(new int[] { 6, 7, 8, 15, 16, 17, 24, 25, 26 }));
    constraints.add(new Constraint(new int[] { 27, 28, 29, 36, 37, 38, 45, 46, 47 }));
    constraints.add(new Constraint(new int[] { 30, 31, 32, 39, 40, 41, 48, 49, 50 }));
    constraints.add(new Constraint(new int[] { 33, 34, 35, 42, 43, 44, 51, 52, 53 }));
    constraints.add(new Constraint(new int[] { 54, 55, 56, 63, 64, 65, 72, 73, 74 }));
    constraints.add(new Constraint(new int[] { 57, 58, 59, 66, 67, 68, 75, 76, 77 }));
    constraints.add(new Constraint(new int[] { 60, 61, 62, 69, 70, 71, 78, 79, 80 }));
    return constraints;


This snapped together fairly well. It's taken me longer to write it up than it did to solve.

The oddest part for me was my first test. I wanted a test that was something like the real problem, and this was a good test for that. But it was a larger first step than I usually use. I suspect this is because I had a solution "trail" in mind.

There's still room for some refactoring, though Java doesn't make it particularly easy to get rid of similar loops. You could pull out a Constraints class (rather than the ArrayList); Solver.solved() and Solver.valid() could move over there. Or maybe Solver really should just be Constraints.

There's some room for work in the Board class. I tried it with clone() one time and with a constructor the other. A third alternative would be to put a copy-and-modify operation on Board (having the board generate a copy of itself with the cells modified). I'd keep that up my sleeve if I needed to later.

This had some of the flavor of re-implementing a known algorithm: I certainly had a certain approach in mind when I started. The steps didn't feel so big when I did them the first time, but I felt a strong urge to fill in the blanks while writing it up. (I wonder which impulse is better.)

The algorithm itself turned out to be as nice as I expected. The setting up of constraints is somewhat ugly, but it was quick and easy. I made my list and checked it twice. If I were generalizing beyond this case, I'd definitely revisit constraint setup.

The biggest surprise for me was in performance. I had fully expected this program to need a lot of performance tuning. After all, I didn't do anything to cut down the huge search space. But even the hardest boards I've tried run in about a second.


[Written July 10, 2006.]

Sufficient Completeness and Testing

One way to determine "sufficient completeness" suggests considering all sequences of calls that can take an object to a state. This enumeration suggests important test cases.

How do you know an object "works"? A branch of formal methods known as axiomatic specification can give us some insights.

Mutators and Accessors

We talk of an object as having state and methods to encapsulate some idea. Some methods let us look at the object: these are accessors. (It's immaterial whether these compute a value or return a stored value.) Other methods change an object; these are mutators. Some methods do both, e.g., a pop() method that reads the top of the stack, pops it, and returns it. For our purposes here, split those into two methods. (Martin Fowler calls this refactoring Separate Query from Modifier.)

This analysis is an abstraction. It doesn't deal with objects that change spontaneously or with threads. But that's ok – many interesting objects don't have those challenges.

The first insight is this: for many objects, the state of the object is determined by the sequence of mutator operations called. We can use a "nested function call" expression to represent this.

To be concrete, we'll work with a Money example. For starters, we can create some new Money, add to it, and find its value. If we look at:

            value(add(10, add(2, new)))

we expect "12" back.

Here's where the axiomatic part comes in. We can specify the value() method's behavior using axioms (rules):

      value(new) = 0
      value(add(n, m)) = n + value(m)

("n" is an integer, "m" is a Money object.) With these rules,

value(add(10, add(2, new)))
= 10 + value(add(2, new))
= 10 + 2 + value(new)
= 10 + 2 + 0
= 12
, as expected.

Let's add another mutator into the mix, subtract:

            value(subtract(n, m)) = value(m) – n

Note that this definition lets us have negative amounts; e.g.,

value(subtract(2,new)) = -2

Sufficient Completeness

Sufficient completeness is the idea that we have defined things in such a way that we've assigned a value to any expression. In object terms, this says that our methods know how to work regardless of the sequence of mutator operations that led to the current state of the object.                                      

One simple way to assure sufficient completeness is to look at all the ways an object can be formed, and make sure each accessor is defined for all those possibilities.

In this case, it's fairly simple to show, using an inductive argument. The base case is new. We've defined value(new) directly, so that's ok. The other cases are add(n,m) and subtract(n,m). In both cases, we've defined value for the two forms, in terms of (the simpler) value(m). By induction, we've defined value for all sequences.


This approach to sufficient completeness suggests a testing strategy: test each accessor for each possible expression form. In this case, we expect to see value tested at least three ways: on new Money, on Money that's been added to, and Money that's been subtracted from. If we don't cover at least those three possibilities, an error might creep in. (We don't want to omit the other domain either – integers. A minimal test would use integers that are negative, zero, and positive.)

From Money to Wallet

Let's try a variation on Money – a Wallet. A Wallet has remove instead of subtract, but they key difference is that you can only remove a bill you've added earlier. The first two definitions are the same:

            value(new) = 0  

      value(add(m, w)) = n + value(w)

When we go to define value(remove(m,w)), we realize that removing depends on what w is. If w contains a bill of denomination n, we can manage it; otherwise we have a problem. To go down this path, we'll introduce some rules for remove:

            remove(n, new) = error
      remove(n, add(m, w)) = if m==n then w else add(m, remove(n, w))


Are these definitions sufficiently complete? This time, the argument is trickier:

            value(new) – defined directly
            value(add(n, w)) – defined directly   
            value(remove(n, w)) – defined only if remove can always be converted to new and add

            remove(n, new) – defined as a failure

            remove(n, add(m, w)) – defined – but notice that we define it into an expression of the same length in one case. We have to convince ourselves that this expression is somehow simpler. It is – remove(n,w) is simpler than the longer expression – it pushes the remove to the right where it will either hit the first case of the if clause, or it will try to remove from an empty wallet. Moving the add to the front is no problem; value is defined for add directly.

            remove(n, remove(m, w)) – (This case is easy to forget, but the rules for sufficient completeness tell us to systematically consider each possibility.) If w begins with new or add, then we're covered by the above. If not, apply this rule again until we get to such a w. The result can always be reduced to new and add, so the approach will terminate.

Therefore: yes, our definition is sufficiently complete.

This suggests a number of test cases. We certainly want to get the value of new, add, and remove. For remove, we have four cases: remove on a new, remove on an add with the same value, remove on an add with a different value, and remove from a wallet that's been removed from before. Our arguments for sufficient completeness suggest a good starting set of test cases.

value(add(2, new))
value(remove(1, new))
value(remove(2, add(2, new)))
value(remove(1, add(5, new)))
value(remove(2, remove(5, add(2, add(5, new))))

In Practice

In practice: I rarely sit down and try to write axioms for objects (perhaps once in a year). Even then I don't force myself to be formal and complete.

But I use these ideas day to day:

  • An object's state results from the sequence of mutator calls
  • Each method needs to do its job for every call sequence that led to the current state
  • The possible sequences of mutator calls suggest test cases that need to be covered

Our demonstration of sufficient completeness used a "combinatorial" approach: check every method on every possible call sequence form. This provides a systematic way to identify important values to test.

[Written April 12, 2006.]

A Comparison Algorithm for TDD

Can you test-drive an algorithm?

There's been discussion in the abstract about whether you can test-drive an existing algorithm. To make it more concrete, I've transliterated a difference algorithm from C to Java. (It's the "insert-delete" difference algorithm from Webb Miller's Software Tools Sampler.)

What would constitute a solution? I think it could take one of two forms:

  1. A series of test cases (perhaps with code samples) that would show deriving this algorithm one bit at a time. It isn't enough to just put test cases for each line; the test cases have to make sense in a way that drives the algorithm forward from simpler cases to more complex ones.
  2. A series of difference algorithms, each more sophisticated than the next, and "in range" of additional tests and refactoring.


  • The original version read in files; I've simplified it to just take two arrays of objects.
  • I've done a little bit of renaming, and extracted a helper method.
  • The algorithm considers a matrix of moves, by increasing distance. It walks along the diagonals, looking for the best moves.
  • ORIGIN is used to create the effect of a negative index.
  • The algorithm builds up partial scripts, accumulating the best one it can out of previously formed scripts. Then at the end, it flips the script around so the steps come in the order you'd expect.
  • The "test" doesn't capture expected output – it just writes it to stdout.


One challenge is to create a solution as described above.

The next challenge is to say how much knowing the solution pushed you to write particular tests in a particular order. Would you have written those tests if you'd had a different algorithm in mind, or didn't know an algorithm at all?


import java.io.PrintStream;

public class Diff {
    public class EditScript {
        public final static int DELETE = 1;
        public final static int INSERT = 2;

        int op;
        int line1;
        int line2;

        Object object;

        EditScript link;

        public String toString() {
            String opString = "UNKNOWN";
            if (op == 1)
                opString = "DELETE";
            if (op == 2)
                opString = "INSERT";

            return "Edit: " + opString + " line1: " + line1 + " line2: "
                    + line2 + " obj:" + object;

    public static int MAXFILE = 2000;
    public static int ORIGIN = MAXFILE;

    public void compare(PrintStream out, Object[] in1, Object[] in2) {
       int col, distance, lower, k, m, maxDistance, n, row, upper;

       int[] last_d = new int[2*MAXFILE+1];
       EditScript[] script = new EditScript[2*MAXFILE+1];
       EditScript newEdit;

       maxDistance = 2 * MAXFILE;

       m = in1.length;
       n = in2.length;

       for (row = 0; row < m && row < n && in1[row].equals(in2[row]); row++)
       last_d[ORIGIN] = row;
       script[ORIGIN] = null;

       lower = (row == m) ? ORIGIN + 1 : ORIGIN - 1;
       upper = (row == n) ? ORIGIN - 1 : ORIGIN + 1;

       if (lower > upper) {
           out.println("Files identical");

       for (distance = 1; distance <= maxDistance; distance++) {
           // for each relevant diagonal
           for (k = lower; k <= upper; k+=2) {
               newEdit = new EditScript();

               // Move down from last d-1 on diagonal k+1 puts you
               // further along diagonal k than does moving right from last d-1
               // on diagonal k-1
               boolean moveDownBeatsMoveRight = (k == ORIGIN - distance) || k != ORIGIN+distance && last_d[k+1] >= last_d[k-1];

               if (moveDownBeatsMoveRight) {
                   row = last_d[k+1] + 1;
                   newEdit.link = script[k+1];
                   newEdit.op = EditScript.DELETE;
                   newEdit.object = in1[row-1];
               } else {
                   // move right from last d-1 on diagonal k-1
                   row = last_d[k-1];
                   newEdit.link = script[k-1];
                   newEdit.op = EditScript.INSERT;
                   newEdit.object = in2[row + k - ORIGIN - 1];

               newEdit.line1 = row;
               col = row + k - ORIGIN;
               newEdit.line2 = col;
               script[k] = newEdit;

               // slide down the diagonal
               while (row < m && col < n && in1[row].equals(in2[col])) {
               last_d[k] = row;

               if (row == m && col == n) {         // hit southeast corner, have the answer
                   print(out, script[k], in1, in2);

               if (row == m)  // hit last row, don't look to the left
                   lower = k+2;

               if (col == n)     // hit last column; don't look to the right
                   upper = k-2;

       exceeds(out, distance);

    private EditScript reverse(EditScript start) {
        EditScript ahead;
        EditScript behind;
        EditScript result;

        ahead = start;
        result = null;
        while (ahead != null) {
            behind = result;
            result = ahead;
            ahead = ahead.link;
            result.link = behind; // flip the pointer
        return result;

    private void print(
            PrintStream out,
            EditScript start,
            Object[] in1,
            Object[] in2)
        EditScript script = reverse(start);

        while (script != null) {
            if (script.op == EditScript.INSERT)
                out.println("Insert after line " + script.line1 + ":\t" + in2[script.line2 - 1]);
            else {
                EditScript next = script.link;

                boolean change = next != null && next.op == EditScript.INSERT
                        && next.line1 == script.line1;

                if (change) {
                    out.println("Change line " + script.line1 + " from "
                            + in1[script.line1 - 1] + " to "
                            + in2[next.line2 - 1]);
                    script = script.link; // skip insert

                } else
                    out.println("Delete line " + script.line1 + ":\t"
                            + in1[script.line1 - 1]);
            script = script.link;

    private void exceeds(PrintStream out, int d) {
        out.println("At least " + d + " line(s) inserted or deleted");


import java.io.ByteArrayOutputStream;
import java.io.PrintStream;

import junit.framework.TestCase;

public class DiffTest extends TestCase {
   private final String crlf = "\r\n";
   private final String tab = "\t";

   public void testZeroLength() {
       final String result = performTest(new String[0], new String[0]);
       assertEquals("Files identical" + crlf, result);

   public void testOneLineEquals() {
       final String[] oneLineArray = new String[] { "one" };
       final String result = performTest(oneLineArray, oneLineArray);
       assertEquals("Files identical" + crlf, result);

   public void testDeleteOne() {
       final String[] arrayOne = new String[] { "head", "delete", "tail", };

       final String[] arrayTwo = new String[] { "head", "tail", };

       final String result = performTest(arrayOne, arrayTwo);
       assertEquals("Delete line 2:" + tab + "delete" + crlf, result);

   public void testInsertOne() {
       final String[] arrayOne = new String[] { "head", "tail", };

       final String[] arrayTwo = new String[] { "head", "insert", "tail", };

       final String result = performTest(arrayOne, arrayTwo);
       assertEquals("Insert after line 1:" + tab + "insert" + crlf, result);

public void testDiff() {
   final String[] arrayOne = new String[] {
       "Line1",      // 1 = new line 6
       "then",       // 2 = new line 7
       "different",  // 3 = "Change line 3 from different to extra" (line 8)
       "that",       // 4 = new line 9
       "last",       // 5 = "Delete line 5"
       "new tail"    // 6 = common tail

   final String[] arrayTwo = new String[] {
       "New first",  // 1 = "Insert [new line] after line 0"
       "new 2d",     // 2 = "Insert [new line] after line 0"
       "more",       // 3 = "Insert [new line] after line 0"
       "more",       // 4 = "Insert [new line] after line 0"
       "more",       // 5 = "Insert [new line] after line 0"
       "Line1",      // 6 = old line 1
       "then",       // 7 = old line 2
       "extra",      // 8 = "Change line 3 from different to extra"
       "that",       // 9 = old line 4
       "new tail"    // 10 = common tail

   final String result = performTest(arrayOne, arrayTwo);

       "Insert after line 0:" + tab + "New first" + crlf +
         "Insert after line 0:" + tab + "new 2d" + crlf +
         "Insert after line 0:" + tab + "more" + crlf +
         "Insert after line 0:" + tab + "more" + crlf +
         "Insert after line 0:" + tab + "more" + crlf +
         "Change line 3 from different to extra" + crlf +
         "Delete line 5:" + tab + "last" + crlf,

    public void testFigureOneDiff() {
       final String[] arrayOne = new String[] { "A", "B", "C", "A", "B", "B", "A" };
       final String[] arrayTwo = new String[] { "C", "B", "A", "B", "A", "C" };

       final String result = performTest(arrayOne, arrayTwo);
           "Delete line 1:" + tab + "A" + crlf +
             "Delete line 2:" + tab + "B" + crlf +
             "Insert after line 3:" + tab + "B" + crlf +
             "Delete line 6:" + tab + "B" + crlf +
             "Insert after line 7:" + tab + "C" + crlf,

    private static String performTest(
        final String[] arrayOne,
        final String[] arrayTwo)
       //  Setup.
       final ByteArrayOutputStream byteArrayOutput = new ByteArrayOutputStream();
       final PrintStream printStreamOutput = new PrintStream(byteArrayOutput);

       //  Perform Comparison.
       new Diff().compare(printStreamOutput, arrayOne, arrayTwo);

       //  Fetch result, as a String.
       final String result = byteArrayOutput.toString();
       return result;

[June, 2005. Thanks to Jeff Grigg for providing these test cases.]

Semantics of Fit: A Path Toward New Tools

Fit's standard interpretation tells us how well a program does against a set of test cases. We can design new semantics for reporters (that give us interesting information) and for rewriters (that make interesting transformations of our tests).

The Standard Interpretation

A Fit test looks like a table (or a series of tables) in an HTML document:

x y plus() times()
2 1 3 2
0 3 3 0
2 -1 1 -2


start ScientificCalculator  
enter value 2
press plus  
enter value 1
press equals  
check value 3

But what does a Fit test mean?

This is almost a trick question. One answer is: whatever people agree it means. Another answer is: whatever the Fit framework, the fixtures, and the application make it mean. These two things don't always line up, unfortunately.

Even when we think we're talking about the same interpretation, there can be differences in what a test means. One version of the Calculator fixture could work with the business objects; another could work at a presentation layer; another could work by manipulating the user interface. These fixtures might find different types of defects; they might be easier or harder to write; and so on.

These all generally assume a standard interpretation: use Fit to run the fixture on the table. The good side of this is that it gives us something we care about: an answer to the question, "How well does our program work?"

I'd like to move to a different question:

What can we know about our tests if we don't understand the fixtures?

Alternative Semantics

A different interpretation of the directory containing my test above might yield this table as a result:

Fixture File Table Number
Calculator MyTest.html


Calculator OtherTest.html 3
fit.ActionFixture MyTest.html 2

This is a cross-reference chart, showing where every fixture is used. I can get this information with no reference to the fixture implementation at all: it's just an index of the first cell of each table.

We can find out other interesting things with just a little knowledge of fixtures. For example, if we know that "Calculator" above is a ColumnFixture, then we know that the first row consists of input and outputs. Even without knowing anything about how the fixture is implemented, we can create another interesting table:

Fixture Field Value
Calculator plus() 1
Calculator plus() 3
Calculator times() -2
Calculator times() 0
Calculator times() 2
Calculator x


Calculator x


Calculator y -1
Calculator y 1
Calculator y 3

This table gives you a picture of the input and output domains. A good tester could look at this and notice all kinds of things that weren't tested:

  • no large or small numbers (in either the input or output)
  • no non-numbers
  • no sums that resulted in 0
  • no sums that were negative
  • no sums that were even
  • only even numbers for x, odd numbers for y
  • minus(), divide()
  • etc.

We could create a tool that would give us a domain analysis of our test data:

Test Data for Calculator
Field Max Neg Neg Zero Pos Even Odd Max Pos
plus() no no no yes no yes no
times() no yes yes yes yes no no
x no no yes yes yes no no
y no yes no yes no yes no


Reporters and Rewriters

Two types of interpretations stand out for me:

Reporters – tell you something about a set of tests

Rewriters – change a set of tests

The Index and Vocabulary examples above are reporters. The standard Fit interpretation is close to being a rewriter: it produces a version of the input file, modified to show the results of the tests. (The only thing that keeps it from being a true rewriter is that it leaves the original file in place, so you can run it again.)

Here are some more ideas for useful tools:

  • Count test cases- A reporter telling the total number of test cases for a fixture
  • Count fixture – A reporter telling how many times a fixture occurs
  • Operators – A reporter telling the column names in RowFixtures and ColumnFixtures (or the second column of ActionFixtures)
  • Operands – A reporter telling the data values used (like "Vocabulary" above)
  • Column changer – A rewriter that can rename, delete, or insert a column
  • Cell rewriter – A rewriter that can change cell values (for a specific fixture in a specific column or row)

(I've created a simple version of the Index example above using the AllFiles fixture; the others are speculation.)

Reflection is OK

There are useful semantics that don't try to interpret a fixture, but it can help to "peek" a little. For example, knowing that something is a ColumnFixture tells us that it's likely that the row after the fixture name consists of input and output fields. We can use this information fruitfully. The Vocabulary example above made use of this knowledge.

Furthermore, there is nothing wrong with getting help. If someone had a new type of fixture that subclassed Fixture, but still had ColumnFixture-like semantics, they could provide a helper analysis class that would let us know this.

The goal is not to avoid using fixture-aware code, it's just to avoid the quagmire of trying to interpret another program.

Call to Action

We've had a few years to work with Fit. People are creating test suites large enough to be interesting, and large enough that they need help managing them.

It's time to experiment with new interpretations of Fit tests. (We still may use Fit to help with this task.) The need is there now, by real people doing real work.

[June, 2005.]

Procedural and Declarative Tests

Procedural tests focus on a series of steps; declarative tests work from the input and states. Procedural tests are natural to write, but consider whether a declarative test expresses your intent more concisely and clearly.

Test Styles

There are two common styles of writing tests. One style might be called procedural: a test consists of a series of steps, each acting on or testing the state of the system. The second style might be called declarative: given a state of the system and some inputs, test the resulting state and the outputs. (These terms are from the theory of programming languages.) If we were using Ward Cunningham's fit testing framework, we might think of these as prototypically ActionFixtures vs. ColumnFixtures.

Let's look at a small, concrete example: a simple login screen.

We'd like to test that the OK button is highlighted only if both a username and a password have been specified.

Here's a test in the procedural style:

1 enter username bob
2 expect OK inactive
3 enter password pw
4 expect OK active
5 clear username  
6 expect OK inactive
7 clear password  
8 expect OK inactive

Here's a test of the same capability in a declarative style:

  Username Password OK active?
1 none none no
2 bob none no
3 none pw no
4 bob pw yes

There's a sense in which the first test is more natural to write: it tells you what to do, step by step. But consider some advantages of the second style:

  • If we want to know whether a particular test case is covered, the second style shows it more directly. For example, what if the username and password are both empty? It's easy to see that this case has been considered in the second test.
  • The declarative style fails or passes one row at a time. The procedural style is vulnerable, in that if a check in the middle fails, the whole rest of the test may be invalid. (This isn't "free" though – the code connecting the test to the system must be designed to make this test independence work.)
  • The declarative style has all the critical state, inputs, and outputs listed explicitly. In the procedural style, you may have to trace the state through many steps.
  • The procedural style tends to presume it knows details about the interface; this makes it more brittle to change.

Keeping State

Real systems may involve hidden state: state that affects the system but is not directly set or seen. Consider this example:

This is an accumulator: it provides a running sum and average. Obviously, it must somehow keep track of n, the number of items entered, in order to compute the average.

We can make a declarative test out of this by exposing the hidden state to the test:

n sum data sum' avg' comment
1 100 100 200 100 normal
9 100 0 100 10 add 0
2 2 0 2 0 integer divide

The first two columns, n and sum, represent existing state. Data is the number entered on the screen, and sum' and avg' are the results.

Getting to the hidden state can be a trick. Sometimes the programmers can explicitly expose it as part of a testing interface; other times the state can be accessed by a sequence of steps. In this case, we could setup the proper state for each row by doing this:

Hit the clear button
Repeat n-1 times: add 0
Add sum
[Optional: check sum]

Then the rest of the row might be interpreted as:

Add data
Check sum
Check avg

This puts some burden on the test setup code. But if the setup is not done there, it's probably repeated in a bunch of scripts.

When to Use the Procedural Style?

Procedural style pays off best when the sequential nature of the feature is what's interesting: when how you got somewhere is as important as where you are. Consider the problem of multiple selection, using click, shift, control, and drag. For setup, imagine a vertical list of items, numbered 1 through 10. Consider this test:

1 click 5  
2 release    
3 expect selected 5
4 expect last 5

Or this one:

1 click 5  
2 drag to 7  
3 drag to 3  
4 release    
5 control    
6 click 7  
7 expect selected 3,4,5,7
8 expect last 7

You could perhaps develop a set of declarative tests for these, but the sequence of actions is what's interesting.

What to do?

Leverage procedural tests for the cases where the sequence of actions is paramount, and there's not a lot of commonality between scripts. Scenario tests can benefit from the procedural style.

A declarative test is like a table you could put in a user manual: it concisely and concretely explains a function. Declarative tests sometimes require extra setup, especially when hidden state is involved, but they're often worth that extra trouble. Declarative tests are good for testing permutations of a business rule.

If all else is equal, favor the declarative style.

These guidelines can boost your test-writing efficiency: they move repetitive actions into the test setup, and let you focus on the interesting part of a test.

[Written February, 2005. Brian Marick suggested the terms, though I'm not sure I picked the set he liked best.]

Overview of “Extreme Programming Explained, 2/e”

Kent Beck has released a new edition of Extreme Programming Explained. This note discusses some highlights and compares it to the first edition.

Quick Summary

The second edition of Extreme Programming Explained is out.  This note is just a quick summary (read the original!), with some comments of mine in italics.

There’s an added value, Respect. There are more practices, organized as primary and corollary practices. Primary practices can stand alone; corollary practices need the support of other practices. These work together to make adopting XP be able to be more incremental than "try all these together." This book suggests a simpler approach to planning. Finally, there’s good material on the philosophy and background of XP.

Even if you’re familiar with the first edition, this book gives you a better picture of what XP means.


What is XP?

  • a mechanism for social change
  • a style of development
  • a path to improvement
  • an attempt to reconcile humanity and productivity
  • a software development discipline

These point to more ambitious goals than "a dozen developers in a room" that the first edition mostly claimed.

What are its values?

  • Communication
  • Simplicity
  • Feedback
  • Courage
  • Respect

"Respect" is listed as a new value.

What are its principles?

  • Humanity: balancing individual and team needs
  • Economics: built on the time value of money and the option value of systems and teams
  • Mutual benefit: "the most important XP principle and the most difficult to adhere to"
  • Self-similarity: make the small echo the large (and vice versa)
  • Improvement
  • Diversity
  • Reflection
  • Flow: from lean manufacturing, not from psychology – deliver a steady flow of value by engaging in all development activities simultaneously
  • Opportunity: see problems as opportunities
  • Redundancy
  • Failure: "If you’re having trouble succeeding, fail."
  • Quality
  • Baby steps
  • Accepted responsibility


There are still practices in XP (more than ever). Now, they’re divided into primary practices and corollary practices. Primary practices are ones that are generally safe to introduce one at a time, or in any order. Corollary practices are riskier: they require the support of other practices.

The approach to introducing practices is a lot more gentle-sounding: there’s the idea that you can change yourself, not impose practices on others. Beck advises not changing too fast. This all sounds more gentle than "do all 12 practices" that came through from the first edition.

Primary Practices

These are generally similar to many earlier practices, turning up the knobs a little.

Sit Together: but "tearing down the cubicle walls before the team is ready is counter-productive"

Whole Team: a cross-functional team.

Informative Workspace: a workspace that meets human needs, and a place for big visible charts.

Energized Work: a reinterpretation of "40-hour week" and "sustainable pace"

Pair Programming

Stories: "units of customer-visible functionality." And, "Every attempt I’ve seen to computerize stories has failed to provide a fraction of the value of having real cards on a real wall."

Weekly Cycle: an iteration: plan, write tests & code.

Quarterly Cycle: plan a quarter using themes.

Slack: include things that can be dropped if you get behind.

Ten-Minute Build: automatically build and test everything.

Continuous Integration

Test-First Programming

Incremental Design

Corollary Practices

Real Customer Involvement

Incremental Deployment

Team Continuity

Shrinking Teams: a proposal to reduce teams by making one person as idle as possible, rather than easing the load on everybody. Another element derived from lean thinking.

Root Cause Analysis: the five whys.

Shared Code

Code and Tests: as the primary permanent artifacts.

Single Code Base: one code stream. "Don’t make more versions of your source code… fix the underlying problem."

Daily Deployment

Negotiated Scope Contract

Pay-per-use: "money is the ultimate feedback"

First Edition Practices

For comparison:

The Planning Game: Quarterly Cycle and Weekly Cycle address this. The planning approach in 2/e is simpler.

Small Releases: Incremental Deployment and Daily Deployment carry this much further.

Metaphor: this was always the least well understood practice, and discussion of it has been more or less eliminated from this edition.

Simple Design: Incremental Design and to some extent Single Code Base.

Testing: Test-First Programming

Refactoring: Not called out; I think of it as part of Incremental Design.

Pair Programming

Collective Ownership: Now called Shared Code.

Continuous Integration

40-Hour Week: Energized Work, and to some extent Slack, cover this.

On-Site Customer: Sit Together, Whole Team, and Real Customer Involvement cover this area.

Coding Standards: Not called out explicitly; probably more of a consequence of Shared Code and Pair Programming.

As you can see, some are the same, some have a name change, and some (notably metaphor and the more programming-oriented practices of refactoring and coding standards) are not discussed.



There’s a strategy for all levels of planning:

  • List the items
  • Estimate
  • Set a budget
  • Agree on the work to be done (without changing estimates or budgets)

Planning should involve the whole team.

Beck now suggests estimating in real pair hours (two people working together for an hour). This is a shift from the relative estimates used before.


Testing is built around two principles:

  • Double Checking (e.g., test and code)
  • Defect Cost Increase (making it cheaper to fix it now)

The argument in 1/e about a flat cost of change curve is gone. Instead, the Defect Cost Increase is leveraged to identify testing as important.

Automated customer and programmer tests are important. Use test-first, at both levels.


Design is incremental: "design always." "Once and only once" is still important. Kent suggests these guidelines for simplicity:

  1. Appropriate for the intended audience
  2. Communicative
  3. Factored
  4. Minimal

He has a nice chart comparing design based on instinct, versus thought, versus experience. Sometimes an instinctual design may be good enough, other times, the thought design is good enough, other times experience is required.

"Designing software is not done for its own sake in XP. Design is in service of a trust relationship between technical and business people."

"The price of this strategy is that it requires the discipline to continue investing in design throughout the life of the project and to make larger changes in small steps, so as to maintain the flow of valuable new functionality."


There are different types of scaling.

To scale according the number of people:

  1. Turn the problem into smaller problems
  2. Apply simple solutions
  3. Apply complex solutions if any problem is left.

Uses the "conquer and divide" strategy. "Chip away at complexity while continuing to deliver."

Safety- and security-critical software may require some modest changes.



After Winslow Taylor. The consequences are bad, resulting from separation of planning from execution, and having a separate quality department.

Toyota Production System

"Every worker is responsible for the whole production line."

Ongoing improvement comes from elimination of waste – kaizen.

"If you use a part immediately, you get the value of the part itself as well as information about whether the upstream machine is working correctly." […] "The greatest waste is the waste of overproduction. Software development is full of the waste of overproduction."

Applying XP

(Note: not "adopting" XP.)

There are many ways to start.

Change yourself first, then offer the fruit of that to others. "Continuous" learning is not really continuous.

Two types of groups find it easiest to start XP: those aligned with its values, and those in pain.


There’s not a binary answer to "Am I doing XP? "The goal is successful and satisfying relationships and projects, not membership in the XP club."


Think "multi-site," not "offshore."

"Jobs aren’t going in search of low salaries. Jobs are going in search of integrity and accountability."

"Without dramatic improvement, though, the global market for software will stagnate as more attractive investments are found in manufacturing and biotechnology." Kent’s been quoted as saying "All methodology is based on fear." I think this sentence captures one of the fears XP is intended to address.

The Timeless Way of Programming

"Harmony and balance are the aims of XP."

Community and XP

"A supportive community is a great asset in software development."


The first edition was a manifesto to programmers. The new edition has a broader audience.

The practices are more extreme, the rhetoric less so. The mechanics of programming XP-style are a little less explicit (but there are certainly plenty of books out there on test-driven development, refactoring, and so on.) The philosophy shines through more clearly.

The result is a worthy follow-on.

Extreme Programming Explained, by Kent Beck and Cynthia Andres. Addison-Wesley Professional, 2004. ISBN 0321278658.

[Written January, 2005.]