Acceptance Tests for a Query System

Background

Suppose we’re testing a library search system. The library has a database of book descriptions, and we give it queries. Each query produces a result set of the matching items.

When we consider this system, we might think of the database as a collection of records, and think of the query as a string. The query string represents a boolean expression, so we want to test its logical nature. Finally, the query string conforms to a grammar, so it might be well- or ill-formed.

Test Strategy

To get ideas on testing, we might go to a site like https://www.testing.com, and see Brian Marick’s suggestions. Summarizing:

  • Collection: Test 0, 1, many, and duplicates
  • String: Test blank or not
  • Boolean: Test logical operators
  • Integer: Test 0, smallest, just below smallest, largest, and just above largest.

To this we’ll add a test for “grammatical or not.”

How will we define the tests? These tests work well with a spreadsheet. We can have one table for the database contents, and another for queries.

Data
Author Title Year
Queries
Query Test Value Comment

The semantics will be: load each line of the data table (after the first two), then run each query. Apply the test (with its optional value) to the result. The tests I have in mind are:

  • count – tell the number of items in the result
  • fails – query gets an error (“value” column unused)
  • contains – verifies that a particular phrase is in the result

We might find we need to extend this list later, but this is a good starting point. We won’t even use “contains” in these examples.

Queries

Queries have a simple structure:

query = term
query = term op term
op = AND | OR | NOT

The precedence is left to right, so “fish or beef and steak” is interpreted “((fish or beef) and steak” (though the parentheses are not allowed). Matching should not be case sensitive; the operator names are not case sensitive either. Blanks are ignored.

The NOT operator may need a little explanation. In classical logic, not is a unary operator. In the search system, “a NOT b” is interpreted “a AND NOT b” (but the latter is not a legal expression in our query language). The NOT operator is used to eliminate unwanted terms. Thus, “extreme AND programming NOT sports” would match “Extreme Programming Explored” but not “Is Extreme Programming an Extreme Sport?”

Test Strategy Revisited

We’ll start with the collection rules, and make four separate test databases: empty, one record, many records, and duplicates. For each of these, we’ll try a variety of queries.

We’ll test with a blank and non-blank queries, and we’ll try grammatical and non-grammatical queries. We’ll apply the logical operator testing rules (which we haven’t covered yet).

For the integer values, there are really only two places that use integers: the number of records in the collection (which we covered), and the size of the result. Results can have anywhere from zero to the number of items in the database; only 0 and “max” apply from the testing rules. But we’ll certainly hit a result set of size “1” as well as some intermediate values.

The Simplest Test: An Empty Collection

What can we learn from an empty collection? It seems like that isn’t even worth trying–who would have a library with only one book? So: the data set is empty. (We may not even create the spreadsheet page for it.) Let’s cover the test case for a blank or non-blank query string:

Queries
Query Test Value Comment
count 0 Blank query
fish count 0 Non-blank query

Install this test and run it. That will show you the first reason to start with the simplest test: setting up the environment and running the test is non-trivial. You’d rather start with a small test set. In our example, there’s a second reason it pays to start with an empty collection: we decided that a blank query returns a result with 0 items. We could have decided that it was an error. Murphy’s Law tells you that if you didn’t say, the programmers would have done the opposite of what you ended up wanting.

We can also test non-grammatical queries with an empty collection. It seems reasonable that error checking would happen before the query actually does its work, so it won’t matter if the collection is empty or not. (And a quick chat with the programmers confirms our intuition.)

Queries
Query Test Value Comment
fish AND error Nothing after AND
fish OR error Nothing after OR
fish NOT error Nothing after NOT
AND fish error Nothing before AND
OR fish error Nothing before OR
NOT fish error Nothing before NOT
fish sticks error No operator

We can certainly find other non-grammatical queries to try. Finally, we might toss in one legal but complicated and tricky one:

Queries
Query Test Value Comment
AND AND OR OR NOT NOT NOT count 0 Ugly but legal

One-Element Collection

We need a table with one record:

Data
Author Title Year
William C. Wake Extreme Programming Explored 2001

(What else would I use?)

You might start by repeating the earlier queries; they’ll get the same answer. We can check that a record is found as expected for a simple query:

Queries
Query Test Value Comment
Explored count 1 Simple query
ExPLoRed count 1 Verify case insensitive

Brian Marick (www.testing.com) has suggested some good rules for testing booleans:

  • AND: try once with all conditions true, and once with each condition false and the others true
  • OR: try once with all conditions false, and once with each condition true and the others false
  • NOT: try once true, and once false

In our case, remember that our “NOT” operator is really “AND NOT”, so for “a NOT b” we’ll need to test (a true, b false), (a false, b false), and (a true, b true).

In the comments, TF/FT/etc. indicates which term is true and which is false.

Queries
Query Test Value Comment
Extreme AND Explored count 1 AND TT
Extreme AND sports count 0 AND TF
sports ANd programming count 0 AND FT
skiing Or skipping count 0 OR FF
Wake OR skateboard count 1 OR TF
dive OR wake count 1 OR FT
2001 NOT dancing count 1 NOT TF
dancing NOT singing count 0 NOT FF
2001 NoT extreme count 0 NOT TT

Note that we’ve checked a couple other things by having variety in the test cases: we’ve checked things in all three columns, and we’ve mixed case in both terms and operators. This can be beneficial or not. On the good side, it cuts down the number of test cases. On the bad side, it confounds two things. If “dive OR wake” fails, is it because matching is case-sensitive, or because there’s a problem in “OR”?

You’ll have to come to your own balance on this. If you find yourself unable to figure out what the problem is, it can be a sign to simplify your tests. This can be partially mitigated by keeping early tests simple, focused on one aspect, and let later tests build complexity.

We can explore a few other aspects of booleans. For one thing, there’s a set of identity rules: “A OR A = A”; “A AND A = A”; “A NOT A = false”. This shows up in our queries like this:

Queries
Query Test Value Comment
Extreme AND Extreme count 1 a AND a
Programming OR Programming count 1 a OR a
Explored NOT Explored count 0 a NOT a

We might want to verify that the term order doesn’t matter. (We want “Programming AND Extreme” to have the same result as “Extreme AND Programming”.)

Queries
Query Test Value Comment
Extreme AND Programming count 1 a AND b
Programming AND Extreme count 1 b AND a

Finally, we’d like to verify that operator precedence is right. Remember that “a OR b AND c” should be interpreted “(a OR b) AND c” rather than “a OR (b AND c)”. (Some grammars for logical operators, and most programming languages, treat AND as higher precedence than OR; we’d like to make sure that doesn’t bubble out to the query language.)

A few minutes thought (or a half hour of logical derivation:) will convince you that if “(a OR b) AND c” is true, then “a OR (b AND c)” will also be true. To show that the interpretation is wrong, we’d have to find a case where the second one matches but the first one doesn’t. This case exists when A matches, but B and C do not.

Queries
Query Test Value Comment
Extreme OR Sports AND Agile count 0 precedence

We’re definitely hitting some subtleties now. What happens if we don’t test this case? Are we doomed? What about the many other ways queries could be wrong?

You always have to draw a line somewhere. It took me about an hour to convince myself that the precedence problem could exist, to come up with an example, and to prove that the example could be logically derived. If I hadn’t done this, there’s a chance this bug would surface. The biggest danger is that the programmers would misunderstand the specification, and assume “normal” precedence. But as customer, I’m in the room with them talking about it. If we had just missed this subtlety, note that this is an obscure case in the grand scheme of things: I’ve tested one- and two-term queries more thoroughly, and those are by far the most common ones users will form.

That doesn’t completely excuse the potential problem. If I were worried enough, I could get someone to write a small program that generated “all” queries with 3 or 4 terms. (“All” meaning all combinations of operators and terms present or not, not “all” possible terms.) “a op b op c op d”: Each term might be present or not (2^4=16), and each operator could have one of 3 values (3*3*3), so there are 16*27=432 combinations I might look at. This is an uncomfortable number, but not unbearable. In normal use, I’d probably just pick a handful of problematic-looking ones, and trust the result.

Collection with Many Elements

We’ll define a bigger collection, and repeat our queries. (They’ll have slightly different result counts, but none will go from 0 to a positive number or vice versa.)

Data
Author Title Year
Kent Beck Extreme Programming Explained: Embrace Change 1999
Kent Beck and Martin Fowler Planning Extreme Programming 2000
Martin Fowler et al. Refactoring 1999
Ron Jeffries et al. Extreme Programming Installed 2000
William C. Wake Extreme Programming Explored 2001
Queries
Query Test Value Comment
Explored count 4 Simple query
ExPLoRed count 4 Verify case insensitive
Extreme AND Explored count 1 AND TT
Extreme AND sports count 0 AND TF
sports ANd programming count 0 AND FT
skiing Or skipping count 0 OR FF
Wake OR skateboard count 1 OR TF
dive OR wake count 1 OR FT
2001 NOT dancing count 1 NOT TF
dancing NOT singing count 0 NOT FF
2001 NoT extreme count 0 NOT TT
Extreme AND Extreme count 4 a AND a
Programming OR Programming count 4 a OR a
Explored NOT Explored count 0 a NOT a
Extreme AND Programming count 4 a AND b
Programming AND Extreme count 4 b AND a
Extreme OR Sports AND Agile count 0 precedence

We’ll also add a new query to cover the case “all records are in the result set.” We had this for the one-record case, but we’d feel better seeing it on a bigger set as well.

Queries
Query Test Value Comment
Programming OR Fowler count 5 Find all

Collection with Duplicates

Verify that duplicated entries are treated the same. We decide we won’t “de-dup” but rather just return what we find.

Data
Author Title Year
Kent Beck Extreme Programming Explained: Embrace Change 1999
Kent Beck Extreme Programming Explained: Embrace Change 1999
Kent Beck and Martin Fowler Planning Extreme Programming 2000
William C. Wake Extreme Programming Explored 2001
Martin Fowler et al. Refactoring 1999
Ron Jeffries et al. Extreme Programming Installed 2000
William C. Wake Extreme Programming Explored 2001
Kent Beck Extreme Programming Explained: Embrace Change 1999
Queries
Query Test Value Comment
Extreme count 7 Retain dups
Extreme OR Refactoring count 8 Find all with dups
Refactoring count 1 Find 1 with dups

Conclusion

We’ve created an initial set of tests for a simple query system that might show up in library software. Several things stand out:

  • A spreadsheet is a reasonable tool for capturing this type of test.
  • Some simple test rules (from www.testing.com) provided guidance on how to create test cases.
  • Notice that we have both the test query and its expected result.
  • Many tests are straightforward, but some of them bring out real issues in the specification.

Sidebar: Running the Tests

A spreadsheet provides the test creator with a comfortable, well-understood interface, and it can write easily processed files.

For the example we used, the tester and the programmer have to agree on the columns and their meanings, and the file format. When the tester is done writing a file, they can save it into a tab-delimited or bar-delimited file (whatever they agree on with the programmers). The programmers can write a simple program that will read the file and do the appropriate function. The programmers might generate code, or just have the program read and execute functions directly.

I think of the GUI (graphical user interface) as being lopped off or set aside, and the spreadsheet as providing an alternative interface. It’s often best to focus on feature testing first, without worrying about the GUI until the features are right. A spreadsheet is nice for this approach: the tester still gets a GUI, just not one customized to the problem at hand.

Resources and Related Articles

[May 25, 2001.]