A RowFixture is used to test that a set of items is as expected. The fixture flags surplus or missing items. They look like this:
MyRowFixture | ||
first | last | status() |
Alexander | the Great | ok |
Alexander | the Mediocre | unknown |
Winnie | the Pooh | lagging |
Each row represents a domain object of some sort. The columns have inputs and outputs, as for ColumnFixtures.
Data and Abstract Methods
First, I’ll note that this fixture extends ColumnFixture. This lets it pick up bind() and check(). The former handles the “header” row; the latter makes sure execute() is called. But due to the way the overrides happen, that method is called under different circumstances than for ColumnFixture. I don’t see anything that will call reset() on a per-row basis.
The fixture holds three bits of data: an array containing the results of the query, a list of surplus items, and a list of missing items. From showing usages, I see that the list of surplus items is a list of domain objects; the missing list is a list of Parses.
The first abstract method is query(). It is responsible for producing an array of the “actual” results. The second abstract method is getTargetClass(). It returns the class object representing the type of the row. It’s abstract for an interesting reason: the parent class ColumnFixture defines that method to return the type of the fixture itself. That would just lead to weird errors. By making it abstract, it forces the user to override it.
This is an interesting twist – usually my abstract methods are at the top of the hierarchy, and may get filled in along the way down. In this case, the method is becoming abstract in the middle.
In a sense, that happens because RowFixture and ColumnFixture have a slightly strained relationship. Maybe I’m just not getting why the latter is an example of the former; it feels like the inheritance is more for implementation than anything.
doRows() – The Overall Algorithm
The main algorithm is in doRows(): bind the columns (ala ColumnFixture), run the query() to get a list of actuals, run a match(), then add rows for any surplus or missing items.
Along the way, this method calls two overloaded list() methods: one for making a list of Parses, the other for making a list of objects. This parallel structure (methods for each of the two main data structures) continues across the class.
Method doRows() calls buildRows() to add in new rows for the surplus values. This method works by building up a “fake” head of a parse chain, then adding each item to the last one in the list. In the end, it throws away the head and returns the interesting part of the list, which gets attached to the end of the table. This seems like a little pattern worth remembering if I ever need to add rows to a fixture.
match() – The Heart
The match() routine is a recursive algorithm. Given the list of expected items, the list of computed items, and a column to start looking in, it figures out what matches and what’s missing or surplus.
Since this algorithm looks complicated, I’m going to start by just looking around. First, what are the places that call it? By doRows() certainly, since that’s how we got here. Then it’s called recursively at two places inside match() itself. The good news is we don’t have some sort of pair of mutually recursive methods.
Recursive algorithms have a base case and a recursion case. The recursive case here is just incrementing column, and passing along lists. The column is always incrementing, and the first if says that if we’ve exceeded the number of columns, we should just do a check on the lists. That makes it look like we’ll always terminate: we either increment column, in which case we’ll eventually stop, or we do something that doesn’t recurse, which will stop as well. (Or rather, if it doesn’t, it won’t be because of the recursion here.)
The other thing to look at on these recursive calls is the lists. We know the column gets bigger – do the lists get smaller? One case passes on the originals, so we know it’s no bigger. The other is trickier – I see things that seem to indicate that the lists will shrink (tests for 0 or 1 item), but it’s not obvious that it must be so without a little digging.
So, from the top of match(): the first case we mentioned before – if we’re past the number of columns, do a check() on the currents lists. (We’ll come back to that method later.) The second case says that if the current column binding is null, move on to the next column. The third and final case is where the meat is: we’re in a column in the middle, trying to match.
So, we build up two maps: one for the expected, one for the computed (“actual”). Each map has a list of items that have the given value at the chosen column. We pull out the keys in either list, and work our way through them. Here, there are four interesting cases:
- The expected list is empty – we have a value found only in the computed list, so add it to the surplus list.
- The computed list is empty – we have a value found only in the expected list, so add it to the missing list.
- There is only one value in each list, and they have the same key (by how we got here) – check this row (actual vs. computed).
- Finally, both lists have more than one item with the same key value – recurse, but only on the list of items with this same key value. (Now I can see that I’m recursing on a list that’s no bigger. It could be the same size if all the keys are the same.)
I’m left to wonder – does this work as a set or a multi-set? That is, can we have completely duplicate rows if the “same” object is in the list twice? I’ll come back to this.
eSort() and cSort()
There are two “sort” routines, one for expected and one for computed. They’re pretty similar, so I’ll describe them together. (I don’t get why they’re a sort of any sort, though.)
Each routine produces a Map, from key values to a List of Objects (either domain objects or Parses). The bin() method takes care of putting items in the map. That method expands an Array into a List; the RowFixtureTest mentions a bug in that neighborhood and I suspect this is to address that.
The sort() methods handle exceptions and rows with no value in a particular cell.
Back to the Algorithm
I think I understand what’s going on enough to put it into words now. To make a match, we start in a given column. Each list gets divided into buckets, based on the value of the cell at that column. If buckets have 0 or 1 item, then we have a good enough match. Otherwise, we’ll look at more columns for those items. Eventually, we’ll run out of items or columns.
check()
The last interesting bit is around the check() routine. It goes through and checks the columns one at a time, using the normal TypeAdapter facilities. The routine is recursive: it peels off the front of each list, recursing until one or both lists is empty.
Leftovers and Learnings
I had a question about whether it acted like a multi-set or a set. It looks like it’s basically multi-set-like, from a simple test with a list of integers.
The other big thing for me to wonder is how I’d have done a similar fixture. I think I’d have expected a simple set. The problem is, that’s fine for the query() side, but not so good for the “expected” side: how would you get from those values to construct the objects to compare as sets? (Knowing the contents of an object’s fields and return values from its methods doesn’t tell you how to construct it.)
Another alternative would be to get the query values, and match each one up against the rows. The naive algorithm for this is a little slow (n^2). It might be a bit simpler. I suspect its report wouldn’t be as nice.
The current algorithm is able to take advantage of partial matches – if enough data cells make it a unique match, it can then know it has the “right” element even though some of the fields/methods are wrong.
Closing Out…
That concludes my tour of Fit. I focused on the main fit package, skipping a couple more minor classes. The code reading was a good exercise for me – I have a better sense of some of the tradeoffs in the code, and of the dynamics in the Fit community.
=====
The series: