Refactoring: An Example, Extended

An earlier refactoring example is taken further, introducing true templates and improving performance.

Introduction

This paper extends an earlier example of refactoring. (See "Refactoring: An Example" in Extreme Programming Explored) In that paper we refactored a Java program that was creating too many strings (and was in general need of cleanup as well). One of the things we identified was that the code worked with strings instead of introducing and using a template type. (This is known as "primitive obsession" in Martin Fowler's Refactoring book.)

This paper finishes the example by introducing a template class, modifying it to be much more efficient, and adjusting the tests as needed. In the end, the final result is much better than the original.

[I won't repeat the starting point code here; see the earlier article.]

Introducing Template

The substitute() routine embodies two things: the codes to be substituted (with their replacement values), and a strategy for making multiple substitutions. A template should embody the latter but not the former. Looking at the code, we realize it will be better to split these aspects before extracting the new class.

If we had a template class, a natural way to tell it what to do would be to give it a map from patterns to replacements. In Java, the type java.util.Hashtable can provide such a mapping. (In JDK 1.2 and later, you might use the collection classes.)

We'll begin by adding the map setup to substitute():

Hashtable map = new Hashtable();
        map.put("%CODE%", reqId);
        map.put("%ALTCODE%", reqId.substring(0,5) + "-" + reqId.substring(5,8));

("Map's not used – this new code doesn't change anything yet." "Good.") (Run the test.)

Now we can extract a new method substituteAll(). It needs access to the output stream, as well as the map. We have:

    void substituteAll (Hashtable map, PrintWriter out) throws IOException {
        String reqId = map.get("%CODE%").toString();
      
        StringWriter templateOut = new StringWriter();
        substituteCode(sourceTemplate, "%CODE%", reqId, templateOut);

        String altId = reqId.substring(0,5) + "-" + reqId.substring(5,8);
        substituteCode(templateOut.toString(), "%ALTCODE%", altId, out);
    }

    public void substitute(String reqId, PrintWriter out) throws IOException {
        Hashtable map = new Hashtable();
        map.put("%CODE%", reqId);
        map.put("%ALTCODE%", reqId.substring(0,5) + "-" + reqId.substring(5,8));

        substituteAll(map, out);

        out.close();
    }

(Run the test.)

Next, extract the Template class. The substitute() method no longer has any knowledge of the substitution strategy, but substituteAll() still reflects the business logic that determines what is to be substituted. Start the Template class as a near-copy of CodeReplacer: just replace "CodeReplacer" with "Template". Copy the tests as well. (Run the test.)

We won't need method Template.substitute() since it exists only to set up the map. Revise the test to set up the map directly, eliminating the reference to substitute(). (Run the test.)

Delete substitute(), and run the test again.

Back in CodeReplacer, let's make its caller responsible for identifying the template to use:

    public class CodeReplacer {
        Template sourceTemplate;

        public CodeReplacer (Template t) {
            sourceTemplate = t;
        }

        public void substitute(String reqId, PrintWriter out) {
            ...
            sourceTemplate.substituteAll(map, out);
            ...
        }
    }

Eliminate CodeReplacer's unused routines readTemplate() and substituteCode(). (Run the test.)

One last bit of cleanup: move the call to out.close() to the caller of substitute(), on the principle that this routine shouldn't manage the I/O stream. (Run the test.)

CodeReplacer now looks like this:

import java.io.*;
import java.util.*;

/* Replace %CODE% with requested id, and %ALTCODE% with "dashed" version of id. */

public class CodeReplacer {
    Template sourceTemplate;

    public CodeReplacer(Template t) {
        sourceTemplate = t;
    }

    public void substitute(String reqId, PrintWriter out) throws IOException {
        Hashtable map = new Hashtable();
        map.put("%CODE%", reqId);
        map.put("%ALTCODE%", reqId.substring(0,5) + "-" + reqId.substring(5,8));

        sourceTemplate.substituteAll(map, out);
    }
}

The routine is pretty clean: it tracks a template, sets up replacements, and makes substitutions when asked to do so.
 

Organizing Template

The Template class is not in as good a shape. First, we'll make it consult the map instead of worrying about how to build the substitution strings:

    void substituteAll (Hashtable map, PrintWriter out) throws IOException {
        String pattern;
        StringWriter templateOut = new StringWriter();

        pattern = "%CODE%";      
        substituteCode(sourceTemplate, pattern, map.get(pattern).toString(), templateOut);

        pattern = "%ALTCODE%";
        substituteCode(templateOut.toString(), pattern, map.get(pattern).toString(), out);
    }

This code uses a strategy that says to send the first substitution to an intermediate stream, but send the second (last) to the real output stream.

We'd like to have the template avoid knowing that %CODE% and %ALTCODE% are the particular patterns. We can fetch these "generically" from the map. Does it matter in what order the substitution occurs? It's a little subtle, but the answer is "yes". Suppose the template is "%CODE%" and the map is:
    %CODE% -> %ALTCODE%
    %ALTCODE% -> xyz

If we replace "%CODE%" first, the result will be "xyz". If we replace "%ALTCODE%" first, the result will be "%ALTCODE%".

Should it matter in what order they occur? To find that out, you'll need to check with the users (or at least the tests). Let's assume they're willing to say "No patterns are allowed in the replacement strings" or "The template will act as if it scanned once only." This is a reasonable restriction: we're trying to build a simple replacement facility, not a full-blown macro programming language.

Given that restriction, we can safely let the map drive the substitution:

Enumeration enum = map.keys();

        pattern = enum.nextElement().toString();
        substituteCode(sourceTemplate, pattern, map.get(pattern).toString(), templateOut);

        pattern = enum.nextElement().toString();
        substituteCode(templateOut.toString(), pattern, map.get(pattern).toString(), out);

(Run the test.)

We can generalize our machinery:

    void substituteAll (Hashtable map, PrintWriter out) throws IOException {
        String workingTemplate = sourceTemplate;
        Enumeration enum = map.keys();

        while (enum.hasMoreElements()) {
            String pattern = enum.nextElement().toString();
            StringWriter templateOut = new StringWriter();

            substituteCode(workingTemplate, pattern, 
                map.get(pattern).toString(), templateOut);

            workingTemplate = templateOut.toString();
        }

        out.write (workingTemplate);
    }

(Run the test.)

Let's clean up the names of the routines. Let's replace "sourceTemplate" with "template", "readTemplate()" with "load()", "substituteCode()" with "replace()", and "substituteAll()" with "replace()" (with different arguments). We'll need to adjust the callers and tests. (Do it a step at a time, running the test for each change.) Template now looks like this:

import java.io.*;
import java.util.*;

public class Template {
    String template;

    public Template (Reader reader) throws IOException {
        template = load(reader);
    }

    String load(Reader reader) throws IOException {
        BufferedReader br = new BufferedReader(reader);
        StringBuffer sb = new StringBuffer();
        try {
            String line = br.readLine();
            while (line!=null) {
                sb.append(line);
                sb.append("\n");
                line = br.readLine();
            }
        } finally {
            try {if (br != null) br.close();} catch (IOException ioe_ignored) {}
        }
        return sb.toString();
    }

    void replace (
      String template, String pattern, String replacement, Writer out)
                        throws IOException {
        int templateSplitBegin = template.indexOf(pattern);
        int templateSplitEnd = templateSplitBegin + pattern.length();
        out.write(template.substring(0, templateSplitBegin));
        out.write(replacement);
        out.write(template.substring(templateSplitEnd, template.length()));
        out.flush();
    }


    void replace (Hashtable map, PrintWriter out) throws IOException {
        String workingTemplate = template;
        Enumeration enum = map.keys();

        while (enum.hasMoreElements()) {
            String pattern = enum.nextElement().toString();
            StringWriter templateOut = new StringWriter();

            replace(workingTemplate, pattern, 
                map.get(pattern).toString(), templateOut);

            workingTemplate = templateOut.toString();
        }

        out.write (workingTemplate);
    }

}

This is substantially cleaner code than the original, but we're still left with a potential performance issue in the time spent creating and scanning strings multiple times. Can we eliminate the multiple scans?
 

Applying Insight

We scan through the map of replacements once, and rewrite the template once per item. This creates a number of extra strings, as the templates are scanned and re-assembled. It also takes time to scan repeatedly.

What if we turn things around? Go through the template (once) looking for patterns, and writing and replacing as we go. This should be an improvement: the template scan requires only one pass through the template, and map lookup is reasonably cheap.

Scanning the template only once is a big improvement, but we can do even better. Notice that the template is stable, while the map changes each time. This implies that any pre-processing we can do for the template can be re-used across all uses of the template.

In our case, we could break a template into a sequence of plain strings and patterns: "This %template% %with% codes" becomes:

This
    %template%
    %with%
    codes

We can scan this list quickly:

  • if the item is a pattern, look it up, and write the substituted text;
  • if the item is plain text, write it out.

Could we have reached this approach without some insight? I don't know, but I wouldn't necessarily expect every transformation to be automatic. Getting insight usually requires thinking time. The cleaned-up code enabled us to focus on exactly what was required, and the expensive and repeated string searches gave us the incentive to find a better way.

For the code, suppose we have the template divided into an array parts[]
containing strings as previously described. The code for substitute() becomes:

    void replace (Hashtable map, PrintWriter out) throws IOException {
        for (int i=0; i < parts.length; i++) {
            String piece = parts[i];
            if (piece.charAt(0) == '%') {
                out.print (map.get(piece).toString());
            } else {
                out.print (piece);
            }
        }
    }

The setup is more challenging:

    void setup(String template) {
        String rest = template;
        Vector partsList = new Vector();
        int pos1;
        
        while ((pos1 = rest.indexOf(MARKER)) != -1) {
            String front = rest.substring(0, pos1);
            partsList.addElement(front);
            int pos2 = rest.indexOf(MARKER, pos1+1);
            if (pos2 == -1) break;
            partsList.addElement(rest.substring(pos1, pos2+1));
            rest = rest.substring(pos2+1);
        }
        partsList.addElement(rest);

        parts = new String[partsList.size()];
        for (int i=0; i < partsList.size(); i++) {
            parts[i] = partsList.elementAt(i).toString();
        }
    }

To put this in, first add the setup routine without affecting the existing template, and run the test. Then call the setup routine, and run the test. Then add the new replacement. (I used a temporary function.) Run the test. Finally, call the new replacement mechanism. Run the test. Move the temporary function into the main replace code (test); delete the now-unused replace(String,String,String,Writer) method and test. Finally, remove the template field, and pass it as an argument to setup(). You'll have to adjust the test to not look for it any more. (Run the test.)

Template.java looks like this:

import java.io.*;
import java.util.*;

public class Template {
    final static char MARKER = '%';

    String[] parts;

    public Template (Reader reader) throws IOException {
        String template = load(reader);
        setup(template);
    }

    String load(Reader reader) throws IOException {
        BufferedReader br = new BufferedReader(reader);
        StringBuffer sb = new StringBuffer();
        try {
            String line = br.readLine();
            while (line!=null) {
                sb.append(line);
                sb.append("\n");
                line = br.readLine();
            }
        } finally {
            try {if (br != null) br.close();} catch (IOException ioe_ignored) {}
        }
        return sb.toString();
    }

    void setup(String template) {
        String rest = template;
        Vector partsList = new Vector();
        int pos1;
        
        while ((pos1 = rest.indexOf(MARKER)) != -1) {
            String front = rest.substring(0, pos1);
            partsList.addElement(front);
            int pos2 = rest.indexOf(MARKER, pos1+1);
            if (pos2 == -1) break;
            partsList.addElement(rest.substring(pos1, pos2+1));
            rest = rest.substring(pos2+1);
        }
        partsList.addElement(rest);

        parts = new String[partsList.size()];
        for (int i=0; i < partsList.size(); i++) {
            parts[i] = partsList.elementAt(i).toString();
        }
    }

    void replace (Hashtable map, PrintWriter out) throws IOException {
        for (int i=0; i < parts.length; i++) {
            String piece = parts[i];
            if (piece.charAt(0) == MARKER) {
                out.print (map.get(piece).toString());
            } else {
                out.print (piece);
            }
        }
    }

}

The Tests

At this point, I'd take off my refactoring hat, and put on a testing hat, trying to go through the tests and make sure they cover exactly what I want.

The first test I'll add is one for an empty template:

    public void testEmptyTemplate() {
        StringWriter stringOut = new StringWriter();
        PrintWriter testOut = new PrintWriter(stringOut);

        try {
            Template t = new Template(new StringReader(""));
            t.replace(new Hashtable(), testOut);
            assertEquals("", stringOut.toString());
        } catch (IOException ex) {
            fail("testEmptyTemplate() exception " + ex);
        }
    }

When I run this test, it fails. A little examination reveals that we don't want to add empty strings to the partsList vector if it is empty. Replace these lines:

partsList.addElement(front);
                :
        partsList.addElement(rest);

with these:

if (front.length() != 0) {partsList.addElement(front);}
                :
        if (rest.length() != 0) {partsList.addElement(rest);}

The next test is for when the template consists of one line with exactly one pattern:

    public void testOnePattern() {
        StringWriter stringOut = new StringWriter();
        PrintWriter testOut = new PrintWriter(stringOut);

        try {
            Template t = new Template(new StringReader("%PAT%\n"));
            t.replace(new Hashtable(), testOut);
            assertEquals("\n", stringOut.toString());

            assertEquals(t.parts[0], "%PAT%");
            assertEquals(t.parts[1], "\n");
        } catch (IOException ex) {
            fail("testEmptyTemplate() exception " + ex);
        }
    }

This test also fails. (Sigh.) It fails because we haven't defined what should happen if the pattern is not in the map. The test implies the result should be an empty string. (Currently we throw an exception.) This is a case where it's worth consulting the user: "What do you want to happen if there's no match for a pattern? Some reasonable choices might be an empty string, the pattern itself, an error string, or a thrown exception." User: "Give me the pattern itself – then I'll have something to suggest what happened." So fix the test, and fix the code:

if (piece.charAt(0) == MARKER) {
                Object replacement = map.get(piece);
                out.print ( (replacement == null) ? piece : replacement.toString());
            } else {
                out.print (piece);
            }

Next, we'd like to verify that our pattern parsing doesn't get confused when two patterns occur in a row:

    public void testTwoPatterns() {
        StringWriter stringOut = new StringWriter();
        PrintWriter testOut = new PrintWriter(stringOut);

        try {
            Template t = new Template(new StringReader("%PAT%%AGAIN%\n"));
            t.replace(new Hashtable(), testOut);
            assertEquals("%PAT%%AGAIN%\n", stringOut.toString());

            assertEquals("%PAT%", t.parts[0]);
            assertEquals("%AGAIN%", t.parts[1]);
            assertEquals("\n", t.parts[2]);
        } catch (IOException ex) {
            fail("testEmptyTemplate() exception " + ex);
        }

    }

We run this test, and we're OK.

Finally, we'll make a pass through the tests, moving some things to setUp(), and others out, in a spirit of general polishing. Here's the final version of CodeReplacerTest.java:

import java.io.*;
import junit.framework.*;
import java.util.Hashtable;

public class CodeReplacerTest extends TestCase {
    static final String templateContents = "xxx%CODE%yyy%ALTCODE%zzz\n";

    StringWriter stringOut;
    PrintWriter testOut;

    public CodeReplacerTest(String testName) {super(testName);}

    protected void setUp() {
        stringOut = new StringWriter();
        testOut = new PrintWriter (stringOut);
    }

    public void testReplacerSubstitution() {
        try {
            Template template = new Template(new StringReader(templateContents));
            CodeReplacer replacer = new CodeReplacer(template);
            replacer.substitute("01234567", testOut);
            assertEquals("xxx01234567yyy01234-567zzz\n", stringOut.toString());
        } catch (IOException ex) {
            fail ("testSubstitution exception - " + ex);
        }
    }

    public void testTemplateLoadedProperly() {
        try {
            Template template = new Template(new StringReader(templateContents));
            assertEquals("xxx", template.parts[0]);
            assertEquals("%CODE%", template.parts[1]);
            assertEquals("zzz\n", template.parts[4]);
        } catch (Exception ex) {
            fail("Template couldn't load");
        }
    }

    public void testTemplateSubstitution() {
        try {
            Template template = new Template(new StringReader(templateContents));
            Hashtable map = new Hashtable();
            map.put("%CODE%", "01234567");
            map.put("%ALTCODE%", "01234-567");
            template.replace(map, testOut);
            assertEquals("xxx01234567yyy01234-567zzz\n", stringOut.toString());
        } catch (IOException ex) {
            fail ("testTemplateSubstitution exception - " + ex);
        }
    }

    public void testEmptyTemplate() {
        try {
            Template t = new Template(new StringReader(""));
            t.replace(new Hashtable(), testOut);
            assertEquals("", stringOut.toString());
        } catch (IOException ex) {
            fail("testEmptyTemplate() exception " + ex);
        }
    }

    public void testOnePattern() {
        try {
            Template t = new Template(new StringReader("%PAT%\n"));
            t.replace(new Hashtable(), testOut);
            assertEquals("%PAT%\n", stringOut.toString());

            assertEquals("%PAT%", t.parts[0]);
            assertEquals("\n", t.parts[1]);
        } catch (IOException ex) {
            fail("testEmptyTemplate() exception " + ex);
        }
    }

    public void testTwoPatterns() {
        try {
            Template t = new Template(new StringReader("%PAT%%AGAIN%\n"));
            t.replace(new Hashtable(), testOut);
            assertEquals("%PAT%%AGAIN%\n", stringOut.toString());

            assertEquals("%PAT%", t.parts[0]);
            assertEquals("%AGAIN%", t.parts[1]);
            assertEquals("\n", t.parts[2]);
        } catch (IOException ex) {
            fail("testEmptyTemplate() exception " + ex);
        }
    }

}

Final Template Code

import java.io.*;
import java.util.*;

public class Template {
    final static char MARKER = '%';

    String[] parts;

    public Template (Reader reader) throws IOException {
        String template = load(reader);
        setup(template);
    }

    String load(Reader reader) throws IOException {
        BufferedReader br = new BufferedReader(reader);
        StringBuffer sb = new StringBuffer();
        try {
            String line = br.readLine();
            while (line!=null) {
                sb.append(line);
                sb.append("\n");
                line = br.readLine();
            }
        } finally {
            try {if (br != null) br.close();} catch (IOException ioe_ignored) {}
        }
        return sb.toString();
    }

    void setup(String template) {
        String rest = template;
        Vector partsList = new Vector();
        int pos1;
        
        while ((pos1 = rest.indexOf(MARKER)) != -1) {
            String front = rest.substring(0, pos1);
            if (front.length() != 0) {partsList.addElement(front);}
            int pos2 = rest.indexOf(MARKER, pos1+1);
            if (pos2 == -1) break;
            partsList.addElement(rest.substring(pos1, pos2+1));
            rest = rest.substring(pos2+1);
        }
        if (rest.length() != 0) {partsList.addElement(rest);}

        parts = new String[partsList.size()];
        for (int i=0; i < partsList.size(); i++) {
            parts[i] = partsList.elementAt(i).toString();
        }
    }

    void replace (Hashtable map, PrintWriter out) throws IOException {
        for (int i=0; i < parts.length; i++) {
            String piece = parts[i];
            if (piece.charAt(0) == MARKER) {
                Object replacement = map.get(piece);
                out.print ( (replacement == null) ? piece : replacement.toString());
            } else {
                out.print (piece);
            }
        }
    }

}

Conclusion

The original program is much improved, from the changes we have made in:

  • introducing the Template class
  • organizing it to use a map for arbitrary substitutions
  • re-structuring it to scan the template only once

We've made dramatic improvements in this program compared to the original: the code is simpler, yet performs better than ever. Along the way, we've used the approach of supporting our refactorings with unit tests, and only making changes in small steps.

While most code may not require the dramatic changes we've made, it can usually benefit from the ongoing simplification and refinement caused by refactoring.

Resources

Acknowledgments

Thanks for Steve Metsker's further encouragement to finish the example.

[Written 3-6-2000.]