TDD for Sorting, Using Comparators

For most sorting problems, we use a pre-existing algorithm. Then, our job is to create a comparator that reflects the desired order.

Bars sorted by height

What is Sorting?

Suppose you have a sequence of items a[0]…a[n-1], and an operator “<=”. The sorted sequence meets these two requirements:

  1. Permutation: The result is a “scrambling” of the original; nothing added, duplicated, or deleted, and
  2. Order: sort(a)[i] <= sort(a)[i+1]

An important characteristic of sorting algorithms is whether they’re stable. Stability means that if i < j and a[i] == a[j], then a[i] precedes a[j] in the sorted list. Insertion sort is stable; quicksort, which is potentially faster, is not. 

Why might we want stability?

  • For a user looking at a list, they usually prefer that it not change unnecessarily.
  • Stable algorithms often handle nearly-sorted input more quickly. Contrast quicksort, where sorted input is its worst case. 
  • A stable algorithm lets you sort keys one at a time (though it doesn’t require this). By sorting from least-significant to most-significant key, you can get the effect of a multi-key sort even if you only have single-key sorting.

Whether your sorting algorithm is stable or not depends on your language and its libraries. The standard sorts for Java and Swift are stable; the one for C# is not. 

Comparators

A comparator is a function that takes two arguments, and returns information about their ordering. Typically there are two ways of sorting: objects define their own comparison algorithm used by a default sort(), or there’s a sort that takes a comparator for more flexibility. 

There are several styles of comparators, depending on your language. 

  1. Less than (<) – return a boolean
  2. Less than or equal (<=) – return a boolean
  3. NZP – return a Negative number if the first is less than the second, Zero if they’re equal, and a Positive number if the second number is less than the first. 
  4. Sign – return -1, 0, or +1, for <, ==, or >.
  5. Enum – return a value telling whether the two items are in ascending order, equal, or descending. 

The Basic Test: Simple Comparison

The basic test is: given a and b, what’s the result of compare(a,b)? Even if it’s a boolean value, I check the three basic cases:

  a < b, a == b, and a > b

Use data values with “just noticeable differences”. That is, 10<11 is usually a better test than 10<100. 

If your data values don’t have a smooth linear arrangement, you’ll want to test more values to cover the weird cases. For example, musical notes have a letter A-G optionally followed by an accidental: sharp, flat, double-sharp, double-flat, or natural. You can’t just go by the letter to know the relative pitch: F 𝄫 is lower than E. 

A point of caution: If your comparator returns an integer, it may be tempting to return the result of subtracting the second value from the first. This only works if you know your subtraction will stay in range. But Integer.MIN_VALUE – 1 will go out of range, even though MIN_VALUE is less than 1. You’re usually just better off checking for < or <=. 

Checking Equality

Equality has several properties:

  1. Reflexive: a == a
  2. Symmetric: If a == b, then b == a
  3. Transitive: If a == b and b ==c, then a == c

A key distinction is value types vs. reference types. An instance of a value type is equal to another based on all its contained values; reference types are equal based on identity (either pointer equality or an explicit identity field). 

A second concern is shallow vs. deep equality. Shallow equality is defined by looking at a single object’s fields; deep equality recursively compares other objects it points to. Which one you’re using is a design decision. 

I learned this pattern for checking value types for equality, from James Shore: 

  let a1 = some value
  let a2 = the same value in a different instance
  let b = a different value 
  Check that a1 == a2 
  Check that a1 != b

(James also checks hash(a1) == hash(a2) and their string values while he’s got their attention.)

Add the reflexive, symmetric, and transitive checks if you have any concerns about them. 

Global Ordering

If you’ve ever played Paper-Rock-Scissors, you know that Paper covers Rock, Rock breaks Scissors, and Scissors cut Paper. That is, we can compare individual pairs, but not have a “least” item. 

Sorting requires that we know how to compare pairs, but also that there’s a global ordering, which you’ll have if you meet the transitive rule:

  if a < b and b < c, then a < c

Occasionally, you have a domain that’s only partially ordered. Class prerequisites are like that: CS101 < CS102, and Econ301 < Econ302, but we don’t constrain the order of CS vs Econ classes. (No loops are allowed in a partial ordering.)

For partial ordering, a standard sorting algorithm is not the right choice. Instead, look for topological sorting, which lets you put things in an order consistent with its partial order, but may have multiple solutions. [See Further Reading.]

Non-Comparable Values

Some domains have exceptional values that don’t compare normally. For example, doubles have NaN, for which these are all true:

  !(0 < NaN)
  !(0 == NaN)  and
  !(0 > NaN)

Some sorting algorithms are ok with values like this; others are not. You’ll need to look at your documentation to decide. 

You may be able to create a comparator that compares the exceptional value consistently even if its natural order is not. For example, you could enforce that NaN sorts less than any non-NaN value.  

Compound Keys

You may want to sort by more than one field (i.e., primary, secondary, … keys). A classic example is dates: you want to sort by year, but if years are equal sort by month, and if months are equal sort by day. 

One way to do this is with a comparator that looks at the relevant fields. Here’s an example in Swift, implementing “<“:

  if lhs.year != rhs.year {
    return lhs.year < rhs.year
  } else if lhs.month != rhs.month {
    return lhs.month < rhs.month 
  } else { 
    return lhs.day < rhs.day
  }

With this approach, it takes several test cases to check the possibilities:

  2000-10-5 < 2001-9-4
  2001-9-4 > 2000-10-5

  2000-7-5 < 2000-8-4
  2000-8-4 > 2000-7-5

  2000-7-4 < 2000-7-5
  2000-7-5 > 2000-7-4

  2000-7-5 == 2000-7-5

  • I’m checking true and false returns for each path.
  • I’m using minimal differences.
  • For fields that don’t participate in a particular comparison, I make sure they’ll return the opposite answer, so I can tell if I’ve compared the wrong key. 

There are other ways to deal with primary and secondary keys. Java supports andThen() and reversed(), which let you build a complex comparator from a simple one. Swift has at least one place that allows for an array of comparators.

Even if your language doesn’t include these, you can certainly write a comparator that takes a list of comparators and tests them in order. 

Conclusion

I didn’t expect this article to be quite so long:)

We’ve looked at what sorting means, and how many sort algorithms take a comparator that compares two elements. We started with a test that checks for <, =, or >, and looked at special issues around equality testing. 

We considered the need for a global ordering (enabled by the transitive law) to ensure we can sort properly. We finished by looking at a couple interesting possibilities: non-comparable values and multi-key sorting. 

Further Reading

“Arrays – sort” [Java], https://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#sort(T[],%20java.util.Comparator) 

Fundamental Algorithms: The Art of Computer Programming, vol. 1, by D.E. Knuth. Section 2.2.3 discusses topological sorting.

“List<T>.Sort Method” [C#], https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.sort?view=net-7.0 

“sort(by:)” [Swift], https://developer.apple.com/documentation/swift/array/sort(by:)