Home-Grown Refactorings – Be Careful!

When you clean up code, you often need refactorings that aren’t in any tool, and aren’t in Martin Fowler’s Refactoring catalog.

You’re on your own.

When this happens, think like a language lawyer and a logician.

Language Lawyer and Logician

In Martin’s phrase, “Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure.”

“Does not alter the external behavior” is the part we want to focus on: refactoring requires us to consider the semantics of the programming language. Exactly what does an “if” statement mean? A variable? etc.

Some languages are fully specified. Others have an informal spec. Still others are defined “whatever the compiler does in that case.” We have to decide what level of formality we require to justify our transformations.

Once we understand the semantics, we can manipulate them. This is where the logician hat comes in. We can deduce possible refactorings and optimizations based on those semantics.

For example, we might apply the logical notion “A or A = A” to this code:
if (a > 0 | a > 0) {do something}
and simplify it to:
if (a > 0) {do something}

The logic gives us an idea, and then we have to make sure the language semantics match up to it.

Looking for Loopholes

When we have a potential refactoring like the one above, it’s easy to design a transformation that is OK for some cases, but not every case.

I then take a more adversarial stance: Where might it fail?

Let’s work through a few examples.

What If?

When we refactor a conditional, it’s helpful to work case by case:
* Figure out all the possible paths
* Figure out all the possible conditions that control them
* Consider each possible input condition and each possible path: do they have the same effect if you refactor?

We’ll look at an example where this works, and some examples where it doesn’t.

Example 1

A statement such as if (A && B) has two interesting concerns:
The logical concern (which applies whether the operator is & or &&): What happens for the four possibilities: Both A and B true, both A and B false, only A true, only B true?
The short-circuit concern: Is B safe to evaluate?

If you want to change this code, you need to make sure that each path has the same result as the code without the change.

if (age > 12 && score > 650) // age and score both integers
    permit()
else
    deny()

Can we swap the order?

if (score > 650 && age > 12)
    permit()
else
    deny()

Do we get the same logical result? Let’s use a truth table:

age > 12   score > 650     Original     Revised
    F            F          deny()       deny()
    F            T          deny()       deny()
    T            F          deny()       deny()
    T            T          permit()     permit()

The logical concern is covered, since the logical result is the same regardless of the order of terms.

What about short-circuit evaluation? Well, “score > 650” is a simple integer comparison. There’s nothing harmful in evaluating it first.

So, this refactoring appears to be valid.

Can we force this refactoring to fail?

That is pretty simple code, just using integers. Neither can be null.

But we can envision a troubling scenario when there are threads or volatile variables involved.

For example, assume these variables are unprotected across threads, and there’s another thread out there clearing score if it detects age > 12. Depending how things are scheduled, this other thread could run at just the wrong moment and make our code behave differently after refactoring.

Or, assume both variables are volatile, memory-mapped, and we have a circuit that quickly detects a read of age and immediately clears score. In the original code, this would deny() every time. In the revised code, if both conditions start true, it would permit().

You have to dive into the semantics of your particular language to understand whether this sort of thing can happen.

But this is the kind of adversarial thinking you must use to get refactorings that are known safe.

Example 2

Let’s try again with the same structure:

if (card.monthsInUse > 12 && credit.score > 650) {
    permit()
} else {
    deny()
}

Could it become…?

if (credit.score > 650 && card.monthsInUse > 12) {
    permit()
} else {
    deny()
}

You should be able to convince yourself that the truth table will be the same as in the first example.

Is it a valid refactoring?

We’d love to say “yes” but the answer is “it depends”.

One possible problem comes if credit can be null: the short-circuit order issue.

Here’s the code in context:

cards.monthsInUse = 10;
if (random() < 0.5) credit = null;

if (card.monthsInUse > 12 && credit.score > 650) {
    permit()
} else {
    deny()
}

The original code would always take the “else” clause; in the revised code, it would sometimes blow up.

Takeaway: You can’t look at an “if” statement in isolation and decide which refactorings are safe. Its surrounding code can affect it too.

Example 3

Here’s a somewhat different example. Assume threads, volatile variables, and null are not involved.

We’d like to delete the “useless” inner error check (as card2.monthsInUse can’t be less than 12 if we already know it’s greater than 12).

if (card1.monthsInUse > 12 && card2.monthsInUse > 12) {
    card1.monthsInUse = 0;
    if (card2.monthsInUse < 12)
        throw new Error;
}

=>?

if (card1.monthsInUse > 12 && card2.monthsInUse > 12) {
    card1.monthsInUse = 0
}

As I’m sure you’ve guessed: it isn’t safe. Card1 and card2 might refer to the same card. (This is known as aliasing.)

The original code may throw sometimes; the revised code never would.

The Troublemakers

Certain issues show up as troublemakers quite often:

  • Threads – another thread might change state on us
  • Volatile variables – external factors might change state
  • Aliasing – two names might sometimes refer to the same thing
  • Reflection – (No example given) Using reflection with a name from the outside world (e.g., a configuration file) might break if we’ve changed names or signatures
  • Null – a null pointer might break a refactoring

When you create a refactoring, make sure none of these apply to your situation.

Summary

  • Be very careful when you invent your own refactorings.
  • When working with conditional logic, consider all the combinations and whether the old and new code respond exactly the same.
  • With logical operators, consider both the logical equivalence and the issue of short-circuit evaluation.
  • Confusingly, the code before and inside a conditional can affect the validity of a refactoring.
  • While we often get by with a simple model, languages also have issues around threads, volatile variables, aliasing, reflection, and null.