Background
Suppose we’re testing a library search system. The library has a database of book descriptions, and we give it queries. Each query produces a result set of the matching items.
When we consider this system, we might think of the database as a collection of records, and think of the query as a string. The query string represents a boolean expression, so we want to test its logical nature. Finally, the query string conforms to a grammar, so it might be well- or ill-formed.
Test Strategy
To get ideas on testing, we might go to a site like https://www.testing.com, and see Brian Marick’s suggestions. Summarizing:
- Collection: Test 0, 1, many, and duplicates
- String: Test blank or not
- Boolean: Test logical operators
- Integer: Test 0, smallest, just below smallest, largest, and just above largest.
To this we’ll add a test for “grammatical or not.”
How will we define the tests? These tests work well with a spreadsheet. We can have one table for the database contents, and another for queries.
Data | ||
---|---|---|
Author | Title | Year |
Queries | |||
---|---|---|---|
Query | Test | Value | Comment |
The semantics will be: load each line of the data table (after the first two), then run each query. Apply the test (with its optional value) to the result. The tests I have in mind are:
- count – tell the number of items in the result
- fails – query gets an error (“value” column unused)
- contains – verifies that a particular phrase is in the result
We might find we need to extend this list later, but this is a good starting point. We won’t even use “contains” in these examples.
Queries
Queries have a simple structure:
query = term
query = term op term
op = AND | OR | NOT
The precedence is left to right, so “fish or beef and steak” is interpreted “((fish or beef) and steak” (though the parentheses are not allowed). Matching should not be case sensitive; the operator names are not case sensitive either. Blanks are ignored.
The NOT operator may need a little explanation. In classical logic, not is a unary operator. In the search system, “a NOT b” is interpreted “a AND NOT b” (but the latter is not a legal expression in our query language). The NOT operator is used to eliminate unwanted terms. Thus, “extreme AND programming NOT sports” would match “Extreme Programming Explored” but not “Is Extreme Programming an Extreme Sport?”
Test Strategy Revisited
We’ll start with the collection rules, and make four separate test databases: empty, one record, many records, and duplicates. For each of these, we’ll try a variety of queries.
We’ll test with a blank and non-blank queries, and we’ll try grammatical and non-grammatical queries. We’ll apply the logical operator testing rules (which we haven’t covered yet).
For the integer values, there are really only two places that use integers: the number of records in the collection (which we covered), and the size of the result. Results can have anywhere from zero to the number of items in the database; only 0 and “max” apply from the testing rules. But we’ll certainly hit a result set of size “1” as well as some intermediate values.
The Simplest Test: An Empty Collection
What can we learn from an empty collection? It seems like that isn’t even worth trying–who would have a library with only one book? So: the data set is empty. (We may not even create the spreadsheet page for it.) Let’s cover the test case for a blank or non-blank query string:
Queries | |||
---|---|---|---|
Query | Test | Value | Comment |
count | 0 | Blank query | |
fish | count | 0 | Non-blank query |
Install this test and run it. That will show you the first reason to start with the simplest test: setting up the environment and running the test is non-trivial. You’d rather start with a small test set. In our example, there’s a second reason it pays to start with an empty collection: we decided that a blank query returns a result with 0 items. We could have decided that it was an error. Murphy’s Law tells you that if you didn’t say, the programmers would have done the opposite of what you ended up wanting.
We can also test non-grammatical queries with an empty collection. It seems reasonable that error checking would happen before the query actually does its work, so it won’t matter if the collection is empty or not. (And a quick chat with the programmers confirms our intuition.)
Queries | |||
---|---|---|---|
Query | Test | Value | Comment |
fish AND | error | Nothing after AND | |
fish OR | error | Nothing after OR | |
fish NOT | error | Nothing after NOT | |
AND fish | error | Nothing before AND | |
OR fish | error | Nothing before OR | |
NOT fish | error | Nothing before NOT | |
fish sticks | error | No operator |
We can certainly find other non-grammatical queries to try. Finally, we might toss in one legal but complicated and tricky one:
Queries | |||
---|---|---|---|
Query | Test | Value | Comment |
AND AND OR OR NOT NOT NOT | count | 0 | Ugly but legal |
One-Element Collection
We need a table with one record:
Data | ||
---|---|---|
Author | Title | Year |
William C. Wake | Extreme Programming Explored | 2001 |
(What else would I use?)
You might start by repeating the earlier queries; they’ll get the same answer. We can check that a record is found as expected for a simple query:
Queries | |||
---|---|---|---|
Query | Test | Value | Comment |
Explored | count | 1 | Simple query |
ExPLoRed | count | 1 | Verify case insensitive |
Brian Marick (www.testing.com) has suggested some good rules for testing booleans:
- AND: try once with all conditions true, and once with each condition false and the others true
- OR: try once with all conditions false, and once with each condition true and the others false
- NOT: try once true, and once false
In our case, remember that our “NOT” operator is really “AND NOT”, so for “a NOT b” we’ll need to test (a true, b false), (a false, b false), and (a true, b true).
In the comments, TF/FT/etc. indicates which term is true and which is false.
Queries | |||
---|---|---|---|
Query | Test | Value | Comment |
Extreme AND Explored | count | 1 | AND TT |
Extreme AND sports | count | 0 | AND TF |
sports ANd programming | count | 0 | AND FT |
skiing Or skipping | count | 0 | OR FF |
Wake OR skateboard | count | 1 | OR TF |
dive OR wake | count | 1 | OR FT |
2001 NOT dancing | count | 1 | NOT TF |
dancing NOT singing | count | 0 | NOT FF |
2001 NoT extreme | count | 0 | NOT TT |
Note that we’ve checked a couple other things by having variety in the test cases: we’ve checked things in all three columns, and we’ve mixed case in both terms and operators. This can be beneficial or not. On the good side, it cuts down the number of test cases. On the bad side, it confounds two things. If “dive OR wake” fails, is it because matching is case-sensitive, or because there’s a problem in “OR”?
You’ll have to come to your own balance on this. If you find yourself unable to figure out what the problem is, it can be a sign to simplify your tests. This can be partially mitigated by keeping early tests simple, focused on one aspect, and let later tests build complexity.
We can explore a few other aspects of booleans. For one thing, there’s a set of identity rules: “A OR A = A”; “A AND A = A”; “A NOT A = false”. This shows up in our queries like this:
Queries | |||
---|---|---|---|
Query | Test | Value | Comment |
Extreme AND Extreme | count | 1 | a AND a |
Programming OR Programming | count | 1 | a OR a |
Explored NOT Explored | count | 0 | a NOT a |
We might want to verify that the term order doesn’t matter. (We want “Programming AND Extreme” to have the same result as “Extreme AND Programming”.)
Queries | |||
---|---|---|---|
Query | Test | Value | Comment |
Extreme AND Programming | count | 1 | a AND b |
Programming AND Extreme | count | 1 | b AND a |
Finally, we’d like to verify that operator precedence is right. Remember that “a OR b AND c” should be interpreted “(a OR b) AND c” rather than “a OR (b AND c)”. (Some grammars for logical operators, and most programming languages, treat AND as higher precedence than OR; we’d like to make sure that doesn’t bubble out to the query language.)
A few minutes thought (or a half hour of logical derivation:) will convince you that if “(a OR b) AND c” is true, then “a OR (b AND c)” will also be true. To show that the interpretation is wrong, we’d have to find a case where the second one matches but the first one doesn’t. This case exists when A matches, but B and C do not.
Queries | |||
---|---|---|---|
Query | Test | Value | Comment |
Extreme OR Sports AND Agile | count | 0 | precedence |
We’re definitely hitting some subtleties now. What happens if we don’t test this case? Are we doomed? What about the many other ways queries could be wrong?
You always have to draw a line somewhere. It took me about an hour to convince myself that the precedence problem could exist, to come up with an example, and to prove that the example could be logically derived. If I hadn’t done this, there’s a chance this bug would surface. The biggest danger is that the programmers would misunderstand the specification, and assume “normal” precedence. But as customer, I’m in the room with them talking about it. If we had just missed this subtlety, note that this is an obscure case in the grand scheme of things: I’ve tested one- and two-term queries more thoroughly, and those are by far the most common ones users will form.
That doesn’t completely excuse the potential problem. If I were worried enough, I could get someone to write a small program that generated “all” queries with 3 or 4 terms. (“All” meaning all combinations of operators and terms present or not, not “all” possible terms.) “a op b op c op d”: Each term might be present or not (2^4=16), and each operator could have one of 3 values (3*3*3), so there are 16*27=432 combinations I might look at. This is an uncomfortable number, but not unbearable. In normal use, I’d probably just pick a handful of problematic-looking ones, and trust the result.
Collection with Many Elements
We’ll define a bigger collection, and repeat our queries. (They’ll have slightly different result counts, but none will go from 0 to a positive number or vice versa.)
Data | ||
---|---|---|
Author | Title | Year |
Kent Beck | Extreme Programming Explained: Embrace Change | 1999 |
Kent Beck and Martin Fowler | Planning Extreme Programming | 2000 |
Martin Fowler et al. | Refactoring | 1999 |
Ron Jeffries et al. | Extreme Programming Installed | 2000 |
William C. Wake | Extreme Programming Explored | 2001 |
Queries | |||
---|---|---|---|
Query | Test | Value | Comment |
Explored | count | 4 | Simple query |
ExPLoRed | count | 4 | Verify case insensitive |
Extreme AND Explored | count | 1 | AND TT |
Extreme AND sports | count | 0 | AND TF |
sports ANd programming | count | 0 | AND FT |
skiing Or skipping | count | 0 | OR FF |
Wake OR skateboard | count | 1 | OR TF |
dive OR wake | count | 1 | OR FT |
2001 NOT dancing | count | 1 | NOT TF |
dancing NOT singing | count | 0 | NOT FF |
2001 NoT extreme | count | 0 | NOT TT |
Extreme AND Extreme | count | 4 | a AND a |
Programming OR Programming | count | 4 | a OR a |
Explored NOT Explored | count | 0 | a NOT a |
Extreme AND Programming | count | 4 | a AND b |
Programming AND Extreme | count | 4 | b AND a |
Extreme OR Sports AND Agile | count | 0 | precedence |
We’ll also add a new query to cover the case “all records are in the result set.” We had this for the one-record case, but we’d feel better seeing it on a bigger set as well.
Queries | |||
---|---|---|---|
Query | Test | Value | Comment |
Programming OR Fowler | count | 5 | Find all |
Collection with Duplicates
Verify that duplicated entries are treated the same. We decide we won’t “de-dup” but rather just return what we find.
Data | ||
---|---|---|
Author | Title | Year |
Kent Beck | Extreme Programming Explained: Embrace Change | 1999 |
Kent Beck | Extreme Programming Explained: Embrace Change | 1999 |
Kent Beck and Martin Fowler | Planning Extreme Programming | 2000 |
William C. Wake | Extreme Programming Explored | 2001 |
Martin Fowler et al. | Refactoring | 1999 |
Ron Jeffries et al. | Extreme Programming Installed | 2000 |
William C. Wake | Extreme Programming Explored | 2001 |
Kent Beck | Extreme Programming Explained: Embrace Change | 1999 |
Queries | |||
---|---|---|---|
Query | Test | Value | Comment |
Extreme | count | 7 | Retain dups |
Extreme OR Refactoring | count | 8 | Find all with dups |
Refactoring | count | 1 | Find 1 with dups |
Conclusion
We’ve created an initial set of tests for a simple query system that might show up in library software. Several things stand out:
- A spreadsheet is a reasonable tool for capturing this type of test.
- Some simple test rules (from www.testing.com) provided guidance on how to create test cases.
- Notice that we have both the test query and its expected result.
- Many tests are straightforward, but some of them bring out real issues in the specification.
Sidebar: Running the Tests
A spreadsheet provides the test creator with a comfortable, well-understood interface, and it can write easily processed files.
For the example we used, the tester and the programmer have to agree on the columns and their meanings, and the file format. When the tester is done writing a file, they can save it into a tab-delimited or bar-delimited file (whatever they agree on with the programmers). The programmers can write a simple program that will read the file and do the appropriate function. The programmers might generate code, or just have the program read and execute functions directly.
I think of the GUI (graphical user interface) as being lopped off or set aside, and the spreadsheet as providing an alternative interface. It’s often best to focus on feature testing first, without worrying about the GUI until the features are right. A spreadsheet is nice for this approach: the tester still gets a GUI, just not one customized to the problem at hand.
Resources and Related Articles
- Acceptance Test Mechanisms
- www.testing.com, Brian Marick’s site.
- A Fit Spreadsheet
[May 25, 2001.]