Refactor: Parallel Arrays to Array of Objects

Your code sometimes evolves to have two or more arrays operating in parallel. If these arrays work in lockstep, it might be better to have one array with objects instead. Let’s look at an example refactoring from one to the other.

Parallel arrays becoming an array of (new) objects

I didn’t do this as a formalized refactoring, but rather as a series of movies chosen to nudge things into a better shape, keeping the system working at every point along the way.

Starting Point: Parallel Arrays

I started with a QuerySpec object something like this:

class QuerySpec {
  let columnNumbers: [Int]
  let filters: [FilterSpec]
  ...
}

The class began life as “SortSpec”; it grew out of encapsulating the array of integers). I added the filters later.

But as I looked at the class, I saw that some code worked with columnNumbers and not filters, and some the other way around, but there was also code that worked with both. For example, if the columns were rearranged, we had to rearrange both arrays to keep them synchronized.

Furthermore, I could see this code growing – in the future I want to choose whether it sorts that column ascending or descending. Would I create a third array, and then duplicate the rearranging etc.? (I often mentally “stretch” decisions to see if they make sense under changes.)

Instead of parallel arrays, I wanted an array of ColumnSpec (a new object).

First Move: New Class

1. Create the new class. It starts out with no use at all.

public class ColumnSpec {
  let columnNumber: Int
  let filter: FilterSpec
  ... // just an initializer (constructor)
}

I made it be a value object (unchanging once created).

Update Class Before Callers

I had a choice: I could start from the callers’ point of view, or inside the QuerySpec class. I usually find it easier to start inside – I can quickly find out if the new object makes sense.

2. Add a third array, of the new class.

  let columns: [ColumnSpec]

3. Make the constructor(s) correctly populate the new array when they populate the other arrays.

4. Make any mutating methods mutate the new array the same way they do the originals.

(In my case, QuerySpec is a value object, so the “mutators” actually create new objects.)

Flip 1: Read the New Array (& Drop the Parallel Arrays)

Once the new array is always consistent with the original parallel arrays, we’ve done the hard part and now carry the change through.

5. For each “access” method which reads from one or both of the original arrays: make the method pull from the new array rather than the original.

  public func isEmpty() -> Bool {
    return columnNumbers.count == 0
  }
=>
  public func isEmpty() -> Bool {
    return columns.count == 0
  }

6. Once these are done, the original arrays are set and possibly modified, but never consulted again. Delete the arrays and their usages.

Flip 2: Update Callers

The class is better – it got rid of the parallel arrays. But it may reveal vestiges of the old way of doing business.

7. For each method, look at its interface, i.e., its parameters and return values. If they use the types of the original arrays, modify them to the new type if possible.

For my code, the constructors were the worst offenders. For example, there was a constructor taking in an array of one type, and another taking parallel arrays of both types.

These changes only affected a few call sites on the production side, but many tests were more tied to the old API. I introduced a couple helper functions to make these tests read better, and be (retroactively) a little more robust to this type change.

8. Focusing on the original class and its callers, is there any need to rebalance functionality? Can anything move to the new class?

I had at least one place that needed this sort of change:

class QuerySpec {
  ...
  func accepts(column: Int, data: String) -> Bool {
    return columns[column].filter.accepts(data)
  }
}

=> 

class QuerySpec {
  ...
  func accepts(column: Int, data: String) -> Bool {
    return columns[column].accepts(data)
  }
}

class ColumnSpec {
  ...
  func accepts(_ data: String) -> Bool {
    return filter.accepts(data)
  }
}

In some ways, this change is “just” passing the buck. But it puts decisions in the hands of the objects that have the information they need to decide. That makes the ColumnSpec more useful on its own, without dragging along the whole QuerySpec. (It does make testing the higher object a little more work, admittedly.)

That little move is also an example of “Hide Delegate”. The original columns[column].filter.accepts(data) is a (small) “train wreck” – it knows the navigation structure to get down to a filter. This adds friction if we want to change that structure. Switching to only asking ColumnSpec isolates it from that kind of change.

Conclusion

This refactoring has a bit of a dramatic arc.

We pursue this refactoring because we’re worried about duplication.

Our first step is to make things worse!: Increase the complexity of the system (new class and new array), and then increase the duplication (as any mutating methods have to update the new array with parallel – duplicating – code.)

But once we have the new array just as complex as the original ones, we start cutting away. The class gets simpler and we get happier.

But the world isn’t quite done with us – our callers may still be assuming our original structure, and we have to do some work to get them on the same wavelength. A happy ending:)

This refactoring isn’t super common. Perhaps I could have avoided it if I’d wrapped the Int as well as the array of Int. But that need wasn’t obvious to me at first.

Keep an eye out for parallel arrays – by the time you have two, you definitely should consider this refactoring!