Reversible, Incremental Data Structures

Objects usually track state, but you can instead track history and derive the current state from the history. (For example, databases blend both modes behind the scenes.)

Why might history be helpful? One reason is that it can make it very easy to reverse the effect of operations. If the object stores the history of mutation, it can quickly restore an older version. This is useful for undo and for backtracking. 

History of What?

There are three basic approaches to history: store the history of states, store the history of mutations (a “trace” of the mutating calls), or store the current state in terms of previous states.

Consider an immutable array. Storing the history of array values might take a lot of space, where just storing assignments might take a lot less space but be slower to access. 

Backtracking: Polyominos

Storing the state in terms of earlier states shines when you’re doing a lot of backtracking. The polyomino example I was working on a few weeks ago is an example. There, the goal is to identify all possible overlaps, so it’s very common to place tiles, remove them, and try various alternatives.

If you just build up the grid (the state), you might get this:

xxx..
xxx..
xx...
xxxxx
..xx.

But if we don’t keep track of the last piece, how will we re-construct the previous grid?

xxx..
xxx..
xxx..
xx...
.....

Instead of storing the whole board, we’ll store pieces, either to erase them, or to rebuild the grid from scratch without the last one added. 

Now take a bigger problem: a chess game has 64 squares that can be empty, or one of several white or black pieces. We might be examining millions of boards, with lots of variations and backtracking. 

Constructor, Mutator, Accessor

One way to consider methods is to divide them up into constructors, mutators, and accessors. 

For some data structures, we can identify a reduced form, some combination of constructor and mutator methods. 

In effect, mutators take us from one reduced form to another. (This doesn’t imply anything about how it’s actually implemented, just a way to understand it.)

Some mutators build up a new structure, others restore an old value. 

Let’s consider the stack, the ATM of formal methods:

    new() – create an empty stack
    isEmpty(s) – true iff the stack contains no values
push(s, v) – add an item to the top of the stack
top(s) – reveal the value at the top of the stack
pop(s) – remove the item at the top of the stack

Direct Implementation

A direct implementation doesn’t store the accumulated data structure, but rather stores its arguments, which are treated as immutable.

For example, here’s an implementation of the above spec:

new = nil
isEmpty(s) = s == nil
push(s,v) = Pair(stack:s, value:v)
pop(s) = s == nil ? error : s.stack
top(s) = s == nil ? error : s.value

This approach shares the partial stacks:

A set of 4 stacks built from overlapping sub-stacks

In this case, I’m treating “nil” as an empty pointer, but in general it could be a special node. 

We have four stacks, with 7 nodes. Without sharing, it would need 10 nodes. 

Only the method (push) that creates new structures adds a node. Pop restores an earlier state. 

Polyominos with an Incremental Structure

Suppose we define our polyomino board this way. (This time I’m assuming an implicit object.)

new() - create an empty board
isEmpty() - tell if the board is empty
place(shape, position) - build a board from the current board, adding shape at a position
remove() - return a board with its last-added shape removed
isOccupied(position) - tell if a square is in use

The place() method will use a direct implementation, holding the shape, position, and a reference to the previous board.

We might not need remove(), depending how things are built.

The isOccupied method is the most interesting, something like this:

func isOccupied(coordinate) → Boolean {
  if shape.isOccupied(coordinate - self.coordinate) { return true }
  else { return board.isOccupied(coordinate) }
}

(where the empty board always returns false).

We end up with immutable objects, which can easily go back to an earlier state. This is perfect for our backtracking algorithm. 

It’s a good example of a Chain of Responsibility – do something locally if possible, else delegate recursively. There’s another metaphor: imagine each board as a stack of panes of glass. To place a shape is to add another pane of glass (with some parts opaque). When you want to decide if a coordinate is occupied, you look through the panes. 

Conclusion

By building new instances in terms of earlier instances, you can make it much easier to manage undo or backtracking. Methods can use a Chain of Responsibility to answer easy questions directly, and answer harder questions by consulting other versions. 

Further Reading

The Gerrymander Game, Part 1 – Puzzle“, by Bill Wake. Retrieved 2021-05-25.

Data Types and Data Structures, by Johannes J. Martin. Prentice-Hall, 1986. Old, but a good introduction to formal methods, including the idea of direct implementation.