Refactoring, a Whole-Team Guide

A one-page summary of Refactoring is available in PDF.

Refactoring is…? Changing the design of existing code without changing its observable behavior (per Martin Fowler). These changes include renaming, reducing duplication, rearranging code, and more. Even large refactorings are built from a series of small, safe transformations.

Is Refactoring Rework? Not usually. Refactoring shifts some design work to later in the process (when the programmer has more information). Refactoring also helps us implement complex code by starting with simpler code and evolving it. 

Benefits:

  • Code is smaller and easier to understand.
  • The design is easier to extend to support new capabilities.
  • Less duplication means less chance of coordinated code getting out of sync.
  • Improving how the parts connect makes future changes affect fewer places.

Risks:

  • The biggest risk: programmers say “refactor” when they really mean “rewrite”. Refactoring creates code that is provably equivalent; rewriting is re-doing a section of code. Both are valid, but refactoring is much safer. 
  • Refactoring is a skill that must be learned.
  • Any change carries some risk, but we try to make refactoring low-risk. All of these are valid ways to refactor, but some are safer than others:
SaferLess Safe
Automated (tool-supported)Manual
Manual, following a defined recipeAd-hoc manual
Covered by testsNot covered by tests
Small refactoringsBig refactorings
  • Other processes and tools can reduce risk too, such as code review (by pair, mob, or another person), and using a version-control system (e.g., git).
  • Refactoring tools work within a programming language. But systems often have parts that are outside the programming language. For example, a configuration file may need updating when a name changes.

How Do You Manage Refactoring? Refactoring is mostly an integrated part of programming work, not a separable activity.

The exception is that a team may want a large refactoring that must be done all at once. (Large refactorings can usually be interleaved with other work.) This may be something that can be deferred, but gets more expensive over time. The programmers had best consult with others before taking that path. If this happens more than “very rarely”, you should explore why. 

A Deeper Dive

Suppose you’re writing a novel about a spy named Harriet. All of a sudden you remember – there’s already a famous spy named Harriet. No problem! Your word processor can do a global replace: Harriet => Beatrix – a sure winner.

As it turns out, this causes two new problems. First, you like the name Beatrix so much, and it sounds familiar, because you already have a character named Beatrix. Second, to make matters worse, you also had another minor character named Harriet, and now she’s a Beatrix too. That’s too many Beatrices, and you really need to work on your character naming!

But suppose you had a novel-aware word processor. It keeps track of characters, and it already warned you about that minor Harriet. When you rename Harriet, it knows to only rename the spy. It also warns you that Beatrix is already in use and lets you change that first. Much safer!

A refactoring IDE (interactive development environment) is a tool programmers use, and it’s like that fancy word processor. It doesn’t treat code as mere text; it understands both grammar and meaning. When you rename in a way that will create conflict, it warns you.

A minimal IDE might have three or four refactorings. An advanced one will have 20 or more. Martin Fowler’s book Refactoring (see References) describes more than 70, and it’s not a complete list. 

An automated refactoring often saves minutes or hours of work. A powerful IDE is truly a “force multiplier.”

Refactoring – The Math Analogy

I used a writing analogy above, but refactoring has its roots in arithmetic. 

Suppose you have this division problem:

  3 * 20
   ———
   15

You know that 3*20 is really 60, so your problem is equivalent to 60/15. If you refactor 60 to 4*15, your problem is now:

  4 * 15
   ——— 
   15

Since you recall that you can cancel identical factors, you know the answer is 4. 

Refactoring has that same feeling: pulling things together a different way can simplify a system, making future changes easier. Alan Kay, who helped invent object-oriented programming, said, “Point of view is worth 80 IQ points”, and that definitely applies to code.

Refactoring

Martin describes refactoring as: “Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. It is a disciplined way to clean up code that minimizes the chance of introducing bugs.” (See Refactoring in References.)

Refactoring has two key parts:

  1. Changing the design: For code that could be changed in multiple ways, there is no single best design. That depends on not just what the program does today, but also on what it will do tomorrow and in the future. Whether you’re implementing new features or improving old ones, it’s common to need to change the design.
  2. Without changing behavior: It sounds pointless to not change behavior, but it’s easier to improve a design if the behavior stays constant (so you know you haven’t broken anything). 

How do we know behavior hasn’t changed? It actually has multiple pillars supporting it. 

  • Logical/mathematical – Refactorings are designed so we can (at least in principle) prove that they don’t affect behavior.
  • Empirical – proofs can have mistakes, and tools can have bugs. By testing before and after a change, we demonstrate that, for at least some important behaviors, the behavior hasn’t changed.
  • Social – reviews, by humans, provide one more chance to make sure the change is as intended. 

Language Erosion

Refactoring is a lovely term, with a good definition. But unfortunately, terms tend to water down as they’re spread. A too-common informal use of “refactoring” is to mean “rewriting” (deleting a chunk of code and writing it from scratch, hopefully better). 

Rewriting doesn’t make the promises that refactoring does. It doesn’t have as simple a mathematical basis, and it doesn’t guarantee behavior won’t change. It may be the right thing to do, but it is not refactoring. 

Terminology can be even more diluted: some people use “refactoring” to mean “change the design” (perhaps even with an intent to change behavior, as a bug fix or new feature). 

You may have to say “pure refactoring” or “true refactoring” to make sure you’re talking about “refactoring”. It’s the price of success.

Safe Transformations

The core of refactoring is a set of “safe transformations”: transformations that correctly convert code from one shape to a different one. This idea goes back to the study of logic in the late 19th and early 20th century – before computers were invented. 

Programs are just text, but we assign meaning to them. Refactorings are transformation rules on that text, that preserve the meaning.

Let me propose a super-simple language. All it does is imitate the way real languages use variables. 

 1 { define x; 
 2   {  define y; 
 3      use x; 
 4      {  define x; 
 5         {  define z; 
 6            use x;
 7         }
 8      } 
 9   }; 
10   use x;
11 }

If you decide to rename “x”, which lines should you change? In most programming languages, naming is based on nesting of {}. So, if you rename the “x” in line 1, the first “define x”, you should also rename “x” in lines 3 and 10, but not line 6!

Why? Because the nesting rule says “when you see a use, work your way out until you find the define that applies”. The second “use x” (line 6) is in a different “scope”; its definition is in line 4. The first “define x” does not apply to it. 

Understanding this won’t turn you into a programmer or programming language expert. (Odds are, this is more a cure than an infection.) But I wanted to give you a taste of the complexity of refactoring’s transformation rules. 

Renaming is very common. Renaming across files with hundreds of lines, or systems with millions of lines, it would take a long time to carefully check each name and make sure you don’t rename the wrong Harriet. 

Renaming is one of the simpler refactorings. Imagine how much time goes into building a tool that understands the semantics of the language enough to do complicated transformations. It’s paid back many times over as programmers use the tool.

Is Refactoring Rework?

You might wonder if refactoring is rework. After all, you’re paying to change something that’s already working. The answer is – sometimes yes, but usually no.

Think about the work that must get done for a feature. Some of it is defined up-front, some of it as you implement it, and some later when a future change affects it. 

Refactoring and evolutionary design tip the balance: some things that formerly “should” be done up-front can now be done more quickly during development or maintenance. This lowers both the up-front and total cost – but means you do more work later. 

It’s difficult to know what changes will come in the future. If we try to handle all possibilities, it takes far longer to develop, and the code is unwieldy to work with.

When we design, we have to make assumptions. If those assumptions change, we have to pay to undo the old way before implementing the new.

An evolutionary approach is different: make the code support current needs with minimal assumptions, leaving it easy to change to support future needs. Future changes will make use of refactoring – but we’re paying for work (new features) more than rework (redoing old ones). 

Why Refactor?

There are various reasons and times you might refactor. (See Martin Fowler’s “Workflows” in the References.)

Refactoring to Understand: Confronted with messy code, programmers refactor it until they  understand it. The programmer decides whether to check in the result. 

Refactoring in Test-Driven Development: The TDD style is evolutionary – it grows a system in small steps. Refactoring is part of that cycle, to consolidate design improvements as the system grows capabilities. 

Preparatory Refactoring: To implement a new feature, we may wish we had a different design. Preparatory refactoring evolves to the new design, to make adding the new feature easier. It’s a normal part of development, but sometimes the preparation takes longer than expected. 

Kent Beck, who popularized refactoring as a core practice in Extreme Programming, describes it this way:

“To make a hard change, first make the change easy. (Warning – this can be hard.) Then make the easy change.”

Big Refactorings: There are cases where a refactoring is far-reaching. It’s typically not just for the benefit of one new feature, but also to improve the overall system. Wise programmers coordinate this with the whole team. Many projects can’t tolerate “lock the doors” for days, weeks, or months while a big change goes in for no immediate benefit. 

On one hand, a need like this often means the team missed some abstraction or didn’t fully complete an intended design. That deserves reflection.

On the other hand, if you put off a needed refactoring too long, the issue only gets bigger and internal quality degrades.

One example: a team needed to update its tools because operating system updates were taking away capabilities over the years. The product owner consistently rejected this work in favor of new features (even at the expense of slower development). 

Finally, a new release stopped the tools from working at all, and the tool updates had to be done – but the team then couldn’t ship support the new operating system for several months. The business side may have made the right decision, but a little effort over time could have avoided this.

Another example: a team used a poor persistence mechanism, so started to convert to a new, better approach. Before they finished the transition, they introduced a third one, and then a fourth one, all without finishing upgrading the original mechanism. This made the system hard to work in – new people never knew which approach to use. 

How Safe is Refactoring?

Every change has risks. (Unfortunately, not changing has risks too.) Refactorings are intended to be mathematically correct, but even that is not a perfect assurance.

Safest to Least Safe Refactorings
For any of the below – it is always safer to have good tests to reassure us that no behavior is changed!
Simple refactoring by a trusted tool
More complex refactoring by a trusted tool
Careful manual refactoring following a proven pattern
Novel manual refactoring relying on language semantics
Novel manual refactoring applying only to a specific situation

Teams put other safety nets in place too:

  • Fine-grained source control – so we can examine changes in small bites
  • Extra eyes: pairing, mobbing, or code review
  • “Differencing” files: comparing code before and after a change
  • Tests at various levels
  • Run-time monitoring

What can still go wrong?

  • The tool has a bug (rare, but it happens)
  • A mistake in manual refactoring
  • Tests, even if they cover a lot, may not cover a particular refactoring
  • Refactoring is defined in terms of code in a programming language, but there are real world dependencies that may conflict 

For example, an external tool may use a name to map certain code to a database. If we refactor to change the name, the external tool will fail unless it is updated too.

Conclusion

Refactoring is a power tool for programmers: it makes it possible to improve code with little risk of injecting defects. 

Most of the time, it’s just “something programmers do” without obvious impact to others. Once in a while, it leaks through – a big design change is tackled via refactoring, and takes long enough that others will notice. (However – if the change must be made – refactoring is more likely a tool to address it than a cause of the problem.)

Every change has risks, but refactoring’s risks are lower than many other approaches. The biggest risk is that a team says “refactoring” when they mean “rewriting” or something riskier. 

References

Refactoring: Improving the Design of Existing Code, by Martin Fowler et al. (This version uses Java for examples; there’s a newer edition that uses JavaScript.)

Workflows of Refactoring”, by Martin Fowler. Retrieved 2022-02-02.