Tag Archives: refactoring

Review – Clean Code

Clean Code, Bob Martin, Prentice-Hall, 2008.

Bob Martin tackles the challenges of making code sparkling clean. He provides numerous guidelines, and demonstrates their utility in action. I particularly appreciated some of the longer examples where he really works them over. You’ll especially find this book compelling if you’re interested in craftsmanship, refactoring, and/or concrete design. (Reviewed Feb., ’08)

Refactoring Workbook

Premise

"Refactoring improves through practice."

The Refactoring Workbook contains exercises to help you assess and improve your ability to refactor (Java, but close enough for C# too). It's available now from Amazon.com.

Thanks to those who reviewed the drafts! If you like the book, please consider posting a review at Amazon or elsewhere.

Also available: Refactoring in Ruby, with Kevin Rutherford; it has its own website. We did an interview with InformIT too.

Table of Contents

Preface
Chapter 1 – Roadmap

Section 1 – Smells Within Classes
Chapter 2 – The Refactoring Cycle
Chapter 3 – Measured Smells  (Sample at java.net or at InformIT)
Interlude 1 – Smells and Refactorings

Chapter 4 – Names
Chapter 5 – Unnecessary Complexity
Interlude 2 – Inverses

Chapter 6 – Duplication
Chapter 7 – Conditional Logic
Interlude 3 – Design Patterns

Section 2 – Smells Between Classes
Chapter 8 – Data
Chapter 9 – Inheritance
Chapter 10 –Responsibility
Chapter 11 – Accommodating Change
Chapter 12 – Library Classes
Interlude 4 – Gen-A-Refactoring

Section 3 – Programs to Refactor
Chapter 13 – A Database Example
Chapter 14 – A Simple Game
Chapter 15 – Catalog
Chapter 16– Planning Game Simulator
Chapter 17– Where to Go From Here
Bibliography

Appendices
Appendix A. Selected Answers
Appendix B. Java Refactoring Tools
Appendix C. Inverses for Refactorings
Appendix D. Key Refactorings
Inside Cover – Smells and Refactorings

Reviews

"As an occasional teacher of undergraduate programming courses, I think this book is worth its weight in platinum."–Gregory Wilson, Dr. Dobb's Journal, July 2004. Page 82.

Author

William Wake (William.Wake@acm.org, www.xp123.com) is a programmer, coach, and  author.

Source Code

File rwb.zip contains the source code for the longer examples in the book. (Updated Jan., 2014 to more modern Java and JUnit.)

Errata

  • Page 63. The Elements of Programming Style is by Brian W. Kernighan and P.J. Plauger, not Kernighan and Pike. (Thanks to Mike Cohn for spotting this.)
  • Page 121. The column names in the Offering table in the diagram don't match up to the names in the code. Change the getString() calls in Offering.find() to use columns "Name" and "DaysTimes". (Thanks to Glenn Boysko and Mohsen Akhavan for spotting this.)
  • Page 197. "hadMidScore" should be "hadMidRangeScore" in the code fragment:
      boolean hadMidRangeScore = (score > 500)
    (Thanks to Marco Isella for spotting this.)

Background Reading

More Resources

Refactoring Demo Screencast

Four ways to Extract Method.

Refactoring is "improving the design of existing code". We want to do this quickly and safely, so that we're improving the code's design but not introducing bugs while we change it.

This series of videos shows four different ways of performing the refactoring "Extract Method":

The code we're improving is part of a program that lets a user unscramble sentence fragments. The part in bold below shows the code we're going to extract.

public class Board extends JPanel implements PropertyChangeListener {    
    // ...
    
    public String toString() {
        Component[] cards = this.getComponents();
        List sortedResult = Arrays.asList(cards);
        Collections.sort(sortedResult, new BoardComparator()); 
        
        StringBuffer result = new StringBuffer();
        for (int i = 0; i < sortedResult.size(); i++)
            result.append(sortedResult.get(i).toString());
        
        return result.toString();            

    }
}
  1. Sloppy (Flash, 0.9M)
  2. Manual, but "by the book" (Flash, 1.1M)
  3. Automated, using the IDE's refactoring support (Flash, 0.9M)
  4. Automated, at full speed (Flash, 0.5M)

Sloppy

Manual

Automated

Automated, At Speed

[Recorded May, 2006. Tools: Eclipse, Macromedia Captivate, Audacity.]

Review – Refactoring Workbook

Refactoring Workbook, William Wake. Addison-Wesley, 2003.

[Consider this a summary rather than a review my own book.] My goals were to create a workbook that helps people practice recognizing smells (problems) and learn to apply important refactoring techniques. There's a "smell finder" inside the covers to help lead you from symptoms to solutions. The table of contents and the book's home page are here.
(April, '06)

Review – Refactoring to Patterns

Refactoring to Patterns, Joshua Kerievsky. Addison-Wesley, 2005.
Design patterns and refactoring have been related for a long time. (Consider that Ralph Johnson, one of the co-authors of Design Patterns, was a sponsor of the work that created the original refactoring browser.) Josh has cataloged a number of refactorings that can lead your code to (or toward) any of the best-known design patterns. The code examples are excellent: realistic, interesting, and showing what was added or deleted. I recommend it for anybody who understands the basic concepts of refactoring, and is ready to further develop their design and refactoring skills. (Reviewed Sept., ’05)

Review – Working Effectively with Legacy Code

Working Effectively with Legacy Code, Michael Feathers. Prentice Hall, 2004.
Michael has distilled a lot of knowledge about how to safely improve code when you lack the safety net of tests. For example, there are places where you’ll take smaller steps or compile more often to be sure that no new problems are introduced. The book is structured with each chapter focused around a major problem you might have. Odds are, you either need this book or know someone who does. (Reviewed Dec., ’04)

Refactoring Thumbnails

Sven Gorts has introduced what he calls Refactoring Thumbnails. These are UML-like diagrams augmented with some flows, and used to summarize refactorings. (For example, the UML might have no words, but rather squiggles to represent identical text in two different classes.)

In addition to summarizing the transformation involved in simple refactorings, he uses these to show how large refactorings can be created out of smaller ones. A nice example is Break Module Dependencies With Adapter. He shows that you can break package dependencies by doing Separate Interface from Implementation, and then Introduce Indirect Class, or by doing these in the opposite order. For Evolving to the Proxy / Decorator Pattern, he shows several approaches that end up in the same place.

I really like these summaries, and I’ll use this approach to help manage my focus on large refactorings.

Refactorings Require New Tests

Someone asked on the XP egroup about getting access to private methods for testing purposes. Others suggested a number of ways to get this effect, but it got me thinking about refactoring.

Refactoring is often thought of as a pure, safe transformation: convert a program into another program with the same semantics but a better design. From the standpoint of a refactoring tool, the "same semantics" part is crucial.

But refactoring also has a psychological side: a better design, but also a different design. A different design may induce people to act differently (indeed, that's why we do it!). In particular, a different design may give people different expectations about code.

Following are some examples. In each case, I'll assume the code was created by test-driven development, and adequately tested before the refactoring.

  • Extract Method – the code worked as part of another method (and still does). But now, the reader's going to assume they can call this method from other parts of the class. Is the extracted method tested sufficiently on its own terms?
  • Expose Method (private becomes protected) – now, subclasses expect to be able to call this method (either directly or via a call to super()). We'll need to create testing subclasses to verify that it works in that context.
  • Expose Method (to ) – Other objects are free to call it. The original object no longer has control over the order in which this method is called. (We may have had a method that was only to be called if another method was called first; when it was private, we were ok; if we expose that method, it's hard to enforce this obligation.)
  • Extract Class – The object now stands alone. Is there a test class testing this object by itself? You may need to extract a new test class, but you also may find you need new tests to cover everything.

Scratch Refactoring

I recently had a chance to do some refactoring of some Visual Basic code. I hadn't worked with it in several years. In particular, I hadn't worked with the object support that's in VB.Net. It's very striking how much it's like C# with different keywords.

My task was to convert some code from using web services (which were too slow) to just straight object code. Several factors came together:

  • My unfamiliarity with VB.Net and web services.
  • The fact that I was heading out of town for a week and didn't want to risk leaving problems.
  • The team's use of source control which locks checked-out files. (I'm lobbying to change this.)
  • A desire for extra care as the product has almost no automated tests.

This led me to do a scratch refactoring: refactoring with the intent of throwing it away and re-doing it. I refactored away two sets of web services, writing down each of the changes I intended to make (along with the name of the affected file.)

I found several benefits:

  • I learned some little tricks for how to get the compiler to tell me what needed doing.
  • Knowing I would throw the result away let the scratch refactoring go more quickly.
  • I learned a couple tricks for the IDE.
  • When I came back after my trip (to do the real thing), I felt like I was just flying through the changes.

I think it was Brooks who said something like, "It's faster to make a 6-inch mirror and a 10-inch mirror than it is to make a 10-inch mirror." I found that true in this case.

The Impact of Refactoring on Tests

When refactorings change the design of a system, they update the tests just enough to accommodate the revision. This keeps the tests working, but tests have other roles. To continue to support these roles, you often need to further modify the tests and add new ones.
[This article originally published at InformIT.com, January, 2004.]

Roles of Automated Tests

Automated tests support a number of goals:

  • Assuring us that a system does what it's intended to do
  • Supporting refactoring by catching mistakes in manual refactoring
  • Helping to design the system (when using test-driven development)
  • Documenting the way in which internal parts of the system are used

There are different ways to classify automated tests. Figure 1 arranges them by role. In Figure 1, "customer" refers to a team that may include testers and other specialists; "programmer" refers to the group consisting of various types of developers.

Who owns which tests

Figure 1. Tests and their owners

Customers own the highest-level tests (including system tests, performance tests, and so on). Customers implement some of these tests themselves; for example, by specifying test data and expected results in a spreadsheet. At other times, they get the programmers to implement the tests; for example, by noting the test cases on the back of a story card.

Customer tests are supported by a set of test fixtures that let people specify tests at a natural level of detail. Fixtures are on the border of ownership, in that customers and programmers have to negotiate their meaning. Programmers can implement fixtures in any convenient language. Fixtures usually connect to facades or other high-level classes, as customer tests usually test some end-to-end feature.

Programmers own and implement their own tests as well. These are typically class, unit, subsystem, and other tests. In test-driven development, this group includes the tests that drove the code to be written. The specifics of programmer tests depend on how the system is implemented.

Refactorings Affect Tests

Some refactorings inherently affect tests. Consider Rename Class, shown in Figure 2.

Rename class

Figure 2. Rename Class

This refactoring will affect classes that refer to C, including the tests for C, and adjust them to refer to D. To highlight this process, we can draw Figure 3.

Rename Class showing a test client

Figure 3. Rename Class showing a client

But some refactorings don't inherently affect the clients of a class they change. Consider Extract Class, which splits a class into two parts, introducing a new class (see Figure 4).

Extract Class

Figure 4. Extract Class

When we include the test class, it looks like Figure 5.

Extract Class showing a test client

Figure 5. Extract Class showing a test client

The tests will call the same method body (indirectly); it's just moved to a different class.

Another Refactoring: Extract Superclass

Suppose we Extract Superclass on a class. To do this, we'll create a new parent class and move data and methods to it. By default, this need not affect test clients; they can still manipulate an instance of the original class, now a subclass, as shown in Figure 6.

Extract Superclass

Figure 6. Extract Superclass

But presumably the reason we extracted the new class is that we have other uses for it. We'd like to create and use other subclasses. But can these subclasses trust their new parent? It hasn't been tested on its own, but rather only in the context of the original class. So we may need additional tests focused on the new superclass, as shown in Figure 7.

Extract Superclass showing test clients

Figure 7. Extract Superclass showing test clients

When we create a new subclass, it will have its own test as well. But it will have some assurance that the superclass does its job properly.

Example

Many algorithms for searching in graphs or other structures have a common form, something like this:

Stack candidates;
while (!stack.isEmpty()) {
   Something x = (Something) candidates.pop();
   if (x.acceptable()) return x;
   pushMoreCandidates(stack, x);
}

This version of the algorithm uses a stack to manage the candidates, although many variations don't rely on the stack discipline: They just want a new candidate to work with. (Other disciplines include queue, priority, random, and so on)

Suppose we extract a Candidates class to encapsulate that decision, as shown in Figure 8.

Extracting a Candidates Class

Figure 8. Extracting a Candidates class

How Is It Tested?

The algorithm uses the stack to hold a set of candidate values. Suppose the original algorithm is constructed in such a way that it never generates duplicate candidates. Then no test of the algorithm will be able to ascertain whether the stack is a true stack, or one that ignores duplicates.

However, once Candidates is extracted, it may be used in new contexts. If the next use requires set behavior, it might fail–or fail to terminate–without it.

A test that's good enough before refactoring might not be good enough afterward.

Consider the tests:

  • Some are testing the algorithm with little regard to the stack.
  • Others test the stack with little regard to the algorithm.
  • Finally, some tests focus on the interaction between the algorithm and the stack.

To make our tests best accommodate the new structure, we want to move tests around. Tests of the algorithm can stay where they are. Tests of the stack can move to a new test class focusing on testing the Candidates. Extracting this class will expose some previously hidden behaviors; we need to fill out the Candidates tests by adding tests for those behaviors, as shown in Figure 9.

Creating a new test client

Figure 9. Creating a new test client

Tests of the interaction are the most interesting. Earlier, these tests were trying to test the Algorithm and the Stack together. To do this, those tests tried to force the Stack into different states, but it may not have been possible to use the Algorithm to get the Stack into every state we would like to test. With the Candidates class now standing outside the Algorithm, it should be possible to better test that part of the code. This may let us simplify the tests in the Algorithm that were focused on the interaction; those tests can focus on the parts that are interesting from the point of the view of the Algorithm and not try to test Candidates as well.

Refactoring may cause you to add, move, or delete tests.

Sometimes someone says, "I need to expose the private parts of this class so I can be sure what's going on with it." This may indicate that there's a hidden class, just as Candidates was hidden in Algorithm. Extracting the class lets us test it independently, and reduces the urge to expose parts of the original class unnecessarily. This also helps the tests be robust, as they're not using internal details–they respect the class's secret.

What Tests Should We Add?

What tests should be added in response to a design change? I consider this guideline:

Add the tests you would have added if you had created the new design via test-driven development.

This means that you'll ask yourself, "What test would have caused me to write this line of code?" This will force you to consider each statement and why it's there. Then add tests you feel you need to adequately test the public interface of the classes involved.

When Should We Add Tests?

The crux of the problem is that code tested well in its original context may not work in a new context. There are times when you might adjust your tests.

One approach is not to worry about it. Trust that your tests of new features will catch any uses of code refactored to a new context. This rule has the advantage that it's easy to follow, but it may not be as safe as a more aggressive approach. You'll need to be sensitive to how well your tests are finding problems.

Another approach is to add tests when you use a class in a new way. When you refer to a class, see whether its tests seem to cover the new context you plan to use. If not, beef up the tests before you add code. This practice lets you create new tests "just in time," but the rule requires discipline to work.

The most aggressive approach is to add and change tests just after you refactor. It may seem that this would be a hard discipline to follow, but I've found it easier to do it this way than the previous approaches. When adding new code, I find I don't want to stop and backfill tests, but I'm in a more reflective mood when I'm refactoring.

You can choose between these approaches on a case-by-case basis, but pay attention to the feedback you get: Do manual refactorings cause problems, especially ones that don't show up for a while? Do you find problems when using code in a new context? Do your tests miss any problems? If so, improve the discipline of your testing.

Bottom Line

If you focus on refactoring the system under development, and do only the minimal amount of refactoring of tests, your system will be harder to change than it needs to be. You put a lot of effort into giving your system the best design you can; if you neglect to update your tests, it will leave your system harder to change, riskier to extend, more difficult to test, and less clear than it can be.

These guidelines may help:

  • A test that's good enough before refactoring might not be good enough afterward.
  • Refactoring may cause you to add, move, or delete tests.
  • Add the tests you would have added if you had created the new design via test-driven development.

[Originally published at InformIT.com, January, 2004.]

Refactoring Challenge – The Amazing Maze

"Amazing" is a maze generation program from the book BASIC Computer Games, by David Ahl. (The maze program was created by Jack Hauber). The code is used with permission of David Ahl, www.SwapMeetDave.com.

Alan Hensel mentioned using this program as inspiration for a refactoring exercise; I remembered it fondly from my BASIC days, so I thought I'd convert it to Java. 

I found the original code pretty much incomprehensible: most lines consisted of an assignment and a two-way branch.

Your mission is to refactor this to a program that can be understood. You can't do everything with automated steps, but to do it properly you have to take tiny, safe steps. Here's the code: Amazing.zip.

Subjunctive Programming

Programming in the world of "What If?"

The subjunctive is the tense (or mood) used to talk about possible worlds. (You wouldn't say, "I were in charge," but with the subjunctive you would say, "If I were in charge.") Subjunctive programming is an attitude and set of techniques that can help you obtain deeper insight into your problem and its solution.

Stance

The attitude you hold toward your software is crucial.

Some teams have legacy software, developed by people who left long ago, and occasionally maintained by others since then. In this situation, some teams' mantra is "If it ain't broke, don't fix it." Local repairs are the order of the day, even when the team knows this is papering over a deeper problem. This is a stance of fear.

We want a stance of courage. Our software should be under our control, pliable to our will. Our view is, "We have permission to experiment. We have a commitment to explore, but not a commitment to any particular result."

Sometimes a bet doesn't pay off; an experiment just isn't an improvement. But often enough to pay back all, we gain a crucial insight that transforms our program.

Passing all the tests is important. Reducing duplication is important. Communicating through the code is important. But we seek insight as well. Our code may communicate what we've learned so far, but we have more to learn.

Ward Cunningham spoke at XP Universe '01 of the feeling of "waiting for insight." There is a niggling tension when we realize there's a new insight to be had, but we don't yet know what it is. We can savor this tension, and gently try things, all the more to appreciate the release when the new ideas work.

Environment

Two types of environment are involved: the social environment and the technical environment. As is so often true, the social environment has the bigger influence.

The social environment must be one in which it's okay to fail in small ways. A lack of failure suggests that there is no risk-taking, that a team is chasing incremental sure things and avoiding riskier big wins. Some experiments won't work out. This can be particularly challenging for management that is used to a tightly controlled task list.

I have a friend who is lead consultant on a large (non-XP) project that's in danger of delivering late. Every time he stands up, the manager asks, "What's the matter? Aren't you working on such-and-such a task? It's supposed to be done tomorrow." How much experimentation will happen in an environment like this? The social context and reward structure discourage new ideas. This is plan-driven development at its worst.

When it's socially acceptable to experiment, it helps to have a supportive technical environment. The foremost need is for a configuration management (CM) system. An experiment is usually done in a sandbox, a special area where changes can be made without accidentally affecting (or being affected by) other developers' work. Some CM systems have a branch mechanism where you can check in intermediate versions without releasing them into the main line of development until you're ready. (Many experiments won't require checkins during the experiment, but it's handy when you need it.)

In normal development, it's XP practice to check in to the main line very frequently. Experiments should not plan to do this until they've been assessed. (Remember our stance – committing to an experiment doesn't commit you to its result.) Checking in experiments to the main line mid-stream will blur the line between the main path of development and the experiment, and make it hard to back out the the experiment when necessary.

(I usually find it worthwhile to go the other way around – frequently pulling in changes from the mainline to the sandbox during the experiment. This makes it much easier to integrate when the experiment works out.)

Three Techniques

Thought Experiments

Some experiments don't involve code; they involve thinking about the system.

For example, today your production system sees a maximum of 10 concurrent users. The peak CPU usage is 8%, peak I/O activity is 5%, and network use is negligible. How many peak concurrent users can we expect to be able to support?

A simple analysis might say, "CPU is the limiting resource, 100%/8% = 12.5, so we expect to support up to 12.5*10=125 users." A slightly deeper analysis might say, "We have a rule of thumb that we should limit apps to 80% of capacity, so 80%/8% = 10 => up to 100 concurrent users."

We still might need to do experiments to verify this, but already we have expectations. If the experiment showed a peak of 15 or of 500 users, we'd know we ought to investigate more deeply.

We can use "what-if" questions as well: What if we generated keys instead of using human-created ones? What if we were to make this class a Composite? What if we had 100 times as much data? What if everybody in the country had an account?

Thought experiments have limits, of course. Sometimes our mental models are just plain wrong, or are inadequate for the current situation. Other times, we reach impasses that can't be resolved without looking at the real world. It's important to know when to move from thought experiments to real ones.

Kent Beck uses a rule, "No technical discussion should go longer than ten minutes without running a concrete experiment." This helps prevent daydreamy thought experiments that become endless blue-sky or gripe sessions.

Spikes

A spike is a code experiment focused on a narrow aspect of a problem. It may involve playing with the program we're developing, writing a new small program, or trying something else.

For example, you might be worried about database performance in a crucial area of your application. You might try a spike where you run the slowest query (on random data) as many times as possible in 15 minutes. The reality is that your users will do a mix of many query types, but this way you get a worst case without as much work as if you tried to model a realistic workload. (You may run another spike to verify that caching effects aren't making you look too good.)

Or, perhaps you'd like to switch to a new logging subsystem. If you've never used it before, you might write a small application that logs a lot but doesn't do any real work.

A spike lets you quickly explore an abstraction of your real problem, without getting caught up in irrelevant details..

Refactoring

A lot of discussion about refactoring focuses on smells, transformations, and improving code. These are important concerns. But refactoring can be an exploratory tool as well. By using refactoring's safe transformations, we can find out the real consequences of a proposal (rather than guessing, and cleaning up all the bugs later).

Refactoring lets us play with the balance of responsibility between objects, and lets us introduce new abstractions when we need to.

For example, consider software that has an underlying model with multiple views on it. (This is an abstraction of a real situation.)

The Compound has a list of View1 objects. A View1 will only be in the Compound if it was explicitly added to it. Given a Compound (and other information not shown), the ViewFinder can find all the View2 and View3 objects that have the same base. When the ViewFinder was originally introduced, View2 and View3 were not conceived as views on the Base, but they had evolved to become that.

We decided to try an experiment: what if the Compound was regarded as the set of all views implied by what was added (thus eliminating the ViewFinder)? This turned out to simplify a number of things, so we kept the result.

We had valuable, working code; we didn't want to jeopardize that. The code communicated what we understood, but we re-conceptualized this corner of code, and learned to understand even better.

Conclusion

The Scrum software development [Schwaber] process grew out of new product development ideas, and its leaders have always stressed knowledge creation as a key dimension of software development.

By actively trying variations on our software theme, and by learning to better sense the tension of missing insight, we can work toward better software – and better ideas.

Resources and Related Articles

[Written September, 2003.]