The Impact of Refactoring on Tests

When refactorings change the design of a system, they update the tests just enough to accommodate the revision. This keeps the tests working, but tests have other roles. To continue to support these roles, you often need to further modify the tests and add new ones.
[This article originally published at InformIT.com, January, 2004.]

Roles of Automated Tests

Automated tests support a number of goals:

  • Assuring us that a system does what it’s intended to do
  • Supporting refactoring by catching mistakes in manual refactoring
  • Helping to design the system (when using test-driven development)
  • Documenting the way in which internal parts of the system are used

There are different ways to classify automated tests. Figure 1 arranges them by role. In Figure 1, “customer” refers to a team that may include testers and other specialists; “programmer” refers to the group consisting of various types of developers.

Who owns which tests

Figure 1. Tests and Their Owners

Customers own the highest-level tests (including system tests, performance tests, and so on). Customers implement some of these tests themselves; for example, by specifying test data and expected results in a spreadsheet. At other times, they get the programmers to implement the tests; for example, by noting the test cases on the back of a story card.

Customer tests are supported by a set of test fixtures that let people specify tests at a natural level of detail. Fixtures are on the border of ownership, in that customers and programmers have to negotiate their meaning. Programmers can implement fixtures in any convenient language. Fixtures usually connect to facades or other high-level classes, as customer tests usually test some end-to-end feature.

Programmers own and implement their own tests as well. These are typically class, unit, subsystem, and other tests. In test-driven development, this group includes the tests that drove the code to be written. The specifics of programmer tests depend on how the system is implemented.

Refactorings Affect Tests

Some refactorings inherently affect tests. Consider Rename Class, shown in Figure 2.

Rename Class

Figure 2. Rename Class

This refactoring will affect classes that refer to C, including the tests for C, and adjust them to refer to D. To highlight this process, we can draw Figure 3.

Rename Class Showing a Test Client

Figure 3. Rename Class Showing a Client

But some refactorings don’t inherently affect the clients of a class they change. Consider Extract Class, which splits a class into two parts, introducing a new class (see Figure 4).

Extract Class

Figure 4. Extract Class

When we include the test class, it looks like Figure 5.

Extract Class Showing a Test Client

Figure 5. Extract Class Showing a Test Client

The tests will call the same method body (indirectly); it’s just moved to a different class.

Another Refactoring: Extract Superclass

Suppose we Extract Superclass on a class. To do this, we’ll create a new parent class and move data and methods to it. By default, this need not affect test clients; they can still manipulate an instance of the original class, now a subclass, as shown in Figure 6.

Extract Superclass

Figure 6. Extract Superclass

But presumably the reason we extracted the new class is that we have other uses for it. We’d like to create and use other subclasses. But can these subclasses trust their new parent? It hasn’t been tested on its own, but rather only in the context of the original class. So we may need additional tests focused on the new superclass, as shown in Figure 7.

Extract Superclass showing test clients

Figure 7. Extract Superclass Showing Test Clients

When we create a new subclass, it will have its own test as well. But it will have some assurance that the superclass does its job properly.

Example

Many algorithms for searching in graphs or other structures have a common form, something like this:

Stack candidates;
while (!stack.isEmpty()) {
   Something x = (Something) candidates.pop();
   if (x.acceptable()) return x;
   pushMoreCandidates(stack, x);
}

This version of the algorithm uses a stack to manage the candidates, but many variations don’t rely on the stack discipline: They just want a new candidate to work with. (Other disciplines include queue, priority, random, and so on.)

Suppose we extract a Candidates class to encapsulate that decision, as shown in Figure 8.

Extracting a Candidates Class

Figure 8. Extracting a Candidates class

How Is It Tested?

The algorithm uses the stack to hold a set of candidate values. Suppose the original algorithm is constructed in such a way that it never generates duplicate candidates. Then no test of the algorithm will be able to ascertain whether the stack is a true stack, or one that ignores duplicates.

However, once Candidates is extracted, it may be used in new contexts. If the next use requires set behavior, it might fail–or fail to terminate–without it.

 

A test that’s good enough before refactoring might not be good enough afterward.

 

Consider the tests:

  • Some are testing the algorithm with little regard to the stack.
  • Others test the stack with little regard to the algorithm.
  • Finally, some tests focus on the interaction between the algorithm and the stack.

To make our tests best accommodate the new structure, we want to move tests around. Tests of the algorithm can stay where they are. Tests of the stack can move to a new test class focusing on testing the Candidates. Extracting this class will expose some previously hidden behaviors; we need to fill out the Candidates tests by adding tests for those behaviors, as shown in Figure 9.

Creating a New Test Client

Figure 9. Creating a New Test Client

Tests of the interaction are the most interesting. Earlier, these tests were trying to test the Algorithm and the Stack together. To do this, those tests tried to force the Stack into different states, but it may not have been possible to use the Algorithm to get the Stack into every state we would like to test. With the Candidates class now standing outside the Algorithm, it should be possible to better test that part of the code. This may let us simplify the tests in the Algorithm that were focused on the interaction; those tests can focus on the parts that are interesting from the point of the view of the Algorithm and not try to test Candidates as well.

 

Refactoring may cause you to add, move, or delete tests.

 

Sometimes someone says, “I need to expose the private parts of this class so I can be sure what’s going on with it.” This may indicate that there’s a hidden class, just as Candidates was hidden in Algorithm. Extracting the class lets us test it independently, and reduces the urge to expose parts of the original class unnecessarily. This also helps the tests be robust, as they’re not using internal details–they respect the class’s secret.

What Tests Should We Add?

What tests should be added in response to a design change? I consider this guideline:

 

Add the tests you would have added if you had created the new design via test-driven development.

 

This means that you’ll ask yourself, “What test would have caused me to write this line of code?” This will force you to consider each statement and why it’s there. Then add tests you feel you need to adequately test the public interface of the classes involved.

When Should We Add Tests?

The crux of the problem is that code tested well in its original context may not work in a new context. There are times when you might adjust your tests.

One approach is not to worry about it. Trust that your tests of new features will catch any uses of code refactored to a new context. This rule has the advantage that it’s easy to follow, but it may not be as safe as a more aggressive approach. You’ll need to be sensitive to how well your tests are finding problems.

Another approach is to add tests when you use a class in a new way. When you refer to a class, see whether its tests seem to cover the new context you plan to use. If not, beef up the tests before you add code. This practice lets you create new tests “just in time,” but the rule requires discipline to work.

The most aggressive approach is to add and change tests just after you refactor. It may seem that this would be a hard discipline to follow, but I’ve found it easier to do it this way than the previous approaches. When adding new code, I find I don’t want to stop and backfill tests, but I’m in a more reflective mood when I’m refactoring.

You can choose between these approaches on a case-by-case basis, but pay attention to the feedback you get: Do manual refactorings cause problems, especially ones that don’t show up for a while? Do you find problems when using code in a new context? Do your tests miss any problems? If so, improve the discipline of your testing.

Bottom Line

If you focus on refactoring the system under development, and do only the minimal amount of refactoring of tests, your system will be harder to change than it needs to be. You put a lot of effort into giving your system the best design you can; if you neglect to update your tests, it will leave your system harder to change, riskier to extend, more difficult to test, and less clear than it can be.

These guidelines may help:

  • A test that’s good enough before refactoring might not be good enough afterward.
  • Refactoring may cause you to add, move, or delete tests.
  • Add the tests you would have added if you had created the new design via test-driven development.

[Originally published at InformIT.com, January, 2004.]