Tag Archives: tester

Extreme Test Makeover – Agile ’06

Brian Marick and I hosted "Extreme Test Makeover," where people could bring their laptops with code and tests, and have an experienced tester/programmer review them. Observations by participants:

  • Watij tests in Fit are too long/confusing to read for customer.
    • You could write it in JUnit instead of Fit
    • Break them up into small focused tests
  • Neat new delegate syntax (with .Net 2.0)
  • Descriptive variable names are good [even] for short term variables.
  • Keep tests focused on one purpose – If a test needs 3 things to work, create 3 tests
  • Generic isn't always useful.

Thanks to our participants and our experts: Bob Martin, Brian Button, Janet Gregory, JB Rainsberger, Jim Newkirk, Jim Shore, Lisa Crispin, Micah Martin, Randy Coulman, and Ward Cunningham.

Semantics of Fit: A Path Toward New Tools

Fit's standard interpretation tells us how well a program does against a set of test cases. We can design new semantics for reporters (that give us interesting information) and for rewriters (that make interesting transformations of our tests).

The Standard Interpretation

A Fit test looks like a table (or a series of tables) in an HTML document:

Calculator
x y plus() times()
2 1 3 2
0 3 3 0
2 -1 1 -2

 

fit.ActionFixture
start ScientificCalculator  
enter value 2
press plus  
enter value 1
press equals  
check value 3

But what does a Fit test mean?

This is almost a trick question. One answer is: whatever people agree it means. Another answer is: whatever the Fit framework, the fixtures, and the application make it mean. These two things don't always line up, unfortunately.

Even when we think we're talking about the same interpretation, there can be differences in what a test means. One version of the Calculator fixture could work with the business objects; another could work at a presentation layer; another could work by manipulating the user interface. These fixtures might find different types of defects; they might be easier or harder to write; and so on.

These all generally assume a standard interpretation: use Fit to run the fixture on the table. The good side of this is that it gives us something we care about: an answer to the question, "How well does our program work?"

I'd like to move to a different question:

What can we know about our tests if we don't understand the fixtures?

Alternative Semantics

A different interpretation of the directory containing my test above might yield this table as a result:

Index
Fixture File Table Number
Calculator MyTest.html

1

Calculator OtherTest.html 3
fit.ActionFixture MyTest.html 2

This is a cross-reference chart, showing where every fixture is used. I can get this information with no reference to the fixture implementation at all: it's just an index of the first cell of each table.

We can find out other interesting things with just a little knowledge of fixtures. For example, if we know that "Calculator" above is a ColumnFixture, then we know that the first row consists of input and outputs. Even without knowing anything about how the fixture is implemented, we can create another interesting table:

Vocabulary
Fixture Field Value
Calculator plus() 1
Calculator plus() 3
Calculator times() -2
Calculator times() 0
Calculator times() 2
Calculator x

0

Calculator x

2

Calculator y -1
Calculator y 1
Calculator y 3

This table gives you a picture of the input and output domains. A good tester could look at this and notice all kinds of things that weren't tested:

  • no large or small numbers (in either the input or output)
  • no non-numbers
  • no sums that resulted in 0
  • no sums that were negative
  • no sums that were even
  • only even numbers for x, odd numbers for y
  • minus(), divide()
  • etc.

We could create a tool that would give us a domain analysis of our test data:

Test Data for Calculator
Field Max Neg Neg Zero Pos Even Odd Max Pos
plus() no no no yes no yes no
times() no yes yes yes yes no no
x no no yes yes yes no no
y no yes no yes no yes no

 

Reporters and Rewriters

Two types of interpretations stand out for me:

Reporters – tell you something about a set of tests

Rewriters – change a set of tests

The Index and Vocabulary examples above are reporters. The standard Fit interpretation is close to being a rewriter: it produces a version of the input file, modified to show the results of the tests. (The only thing that keeps it from being a true rewriter is that it leaves the original file in place, so you can run it again.)

Here are some more ideas for useful tools:

  • Count test cases- A reporter telling the total number of test cases for a fixture
  • Count fixture – A reporter telling how many times a fixture occurs
  • Operators – A reporter telling the column names in RowFixtures and ColumnFixtures (or the second column of ActionFixtures)
  • Operands – A reporter telling the data values used (like "Vocabulary" above)
  • Column changer – A rewriter that can rename, delete, or insert a column
  • Cell rewriter – A rewriter that can change cell values (for a specific fixture in a specific column or row)

(I've created a simple version of the Index example above using the AllFiles fixture; the others are speculation.)

Reflection is OK

There are useful semantics that don't try to interpret a fixture, but it can help to "peek" a little. For example, knowing that something is a ColumnFixture tells us that it's likely that the row after the fixture name consists of input and output fields. We can use this information fruitfully. The Vocabulary example above made use of this knowledge.

Furthermore, there is nothing wrong with getting help. If someone had a new type of fixture that subclassed Fixture, but still had ColumnFixture-like semantics, they could provide a helper analysis class that would let us know this.

The goal is not to avoid using fixture-aware code, it's just to avoid the quagmire of trying to interpret another program.

Call to Action

We've had a few years to work with Fit. People are creating test suites large enough to be interesting, and large enough that they need help managing them.

It's time to experiment with new interpretations of Fit tests. (We still may use Fit to help with this task.) The need is there now, by real people doing real work.

[June, 2005.]

Fit Spreadsheet – Output

Starting Fit

To use fit, you create a web page that has tables in it; the tables specify tests. (There are other options but that is easiest.) In this case, I'm using Microsoft Word™ and saving the file in HTML format.

 

The fit FileRunner acts as a filter: given a web page, it copies text outside of tables as is, and runs your program on the table entries. Some table entries represent tests that can pass or fail; fit colors them green or red respectively. The output is another HTML file.

 

Fit will also put a summary in the file if you put in a table like this:

 

fit.Summary

counts 0 right, 0 wrong, 0 ignored, 0 exceptions
input file C:\P4\FirstFit\fit\FirstFit-in.htm
input update Thu May 01 10:51:42 EDT 2003
output file C:\P4\FirstFit\fit\FirstFit-out.htm
run date Thu May 01 10:58:28 EDT 2003
run elapsed time 0:00.05

 

With this tool, you don’t manipulate screen elements directly. Instead, you work with an abstraction of them. To me, it feels like talking to somebody over the phone, trying to tell them how to use an application. (“In cell cee seventeen, put equals a one; then go to a one and type ‘fish’.”)

 

This article shows the input to [output from] fit; the result of running it is here [input is here].

Programming and Configuration Notes                                                                

Fit is a tool for customers and testers, but programmers will use it as well, and will have to write some of the fixtures the team uses. In this paper, I’ve tried to use the framework mostly straight out of the box.

 

The CLASSPATH needs to include fit.jar (both in the DOS window and the IDE). The runner command I’m using is:

java fit.FileRunner FirstFit-in.htm FirstFit-out.htm

 

When I do this on the file I have so far, it creates the output file and writes this to the console:

0 right, 0 wrong, 0 ignored, 0 exceptions

Fixtures

Tables in the input file have the name of a fixture in the first row. A fixture is a class that knows how to process the table. Fit comes with several fixtures built in, and programmers can create others.

 

One simple fixture is the ColumnFixture. In this fixture, the first row is the fixture name, and the second row has the names of data. If a name ends without parentheses, it is regarded as a field to fill in; with parentheses, it’s treated as a method (function) call. The fixture fills in all the data fields, and then calls the methods to verify that they return the expected results.

 

Another standard fixture is the ActionFixture. This one consists of a series of commands. These include:

  • start classname: Creates an object of the specified class
  • enter field value: Sets the field to the value
  • press button-name: Calls the method corresponding to the button
  • check method value: Checks that the method returns the expected value

 

The ActionFixture ignores anything past the first three columns; we’ll use the fourth column for comments.

 

So, we’re finally ready to start our application.

 

fit.ActionFixture

start

Spreadsheet

Create a new spreadsheet.

 

This test doesn’t ask for much, but of course it fails. (There isn’t any code yet!)

            0 right, 0 wrong, 0 ignored, 1 exceptions

 

Programmer Notes

The exception is thrown because the Spreadsheet object doesn’t exist. To create it as simply as possible, make it extend Fixture:

 

import fit.Fixture;

public class Spreadsheet extends Fixture {}

 

This gets us back to

            0 right, 0 wrong, 0 ignored, 0 exceptions

 

I’ve put together stubs for the fixtures used in this article: Spreadsheet.java, SpreadsheetFormula.java, and Address.java; here’s a zip file containing all three.

A Few Stories

We have several things we want our spreadsheet to do:

  • Track the contents of cells
  • Distinguish data from formulas
  • Provide both data and formula views of cells
  • Support “+” for appending strings, “’” for reversing strings, “()” for grouping, and “>” for string containment.

Cells

The spreadsheet has a number of cells, each of which has an address. Cells contain string data or formulas.

 

We’ll assume several screen elements:

  • address – the address we’re working with; something like “B19”
  • cell – the cell contents we enter (to the last “address”)
  • formula – the cell contents as entered (for the last “address”)
  • display – the cell contents as seen when the formulas are applied (for the last “address”)

 

We’ll start with a simple data cell.

fit.ActionFixture

Comments

start

Spreadsheet

 

 

enter

a1

abc

 

check

a1

abc

Text in cell

check

formula

abc

Formula is same. (Looks in last-mentioned cell.)

 

Now let’s add in a formula cell. (Note that this table omits the “start” line; this means it’s working on the same object as before. This lets us not repeat the setup, but it also makes the tests less independent.)

 

fit.ActionFixture

Comments

enter

a1

abc

 

enter

b1

=A1

Simple copying formula

check

formula

=A1

Formula is there

check

a1

abc

Original text in A1

check

b1

abc

Text was copied to B1

 

The essence of a spreadsheet is the automatic updates. Let’s change A1 and see it happen.

 

fit.ActionFixture

Comments

enter

a1

abc

 

enter

b1

=A1

Simple copying formula

check

b1

abc

Copied value

enter

a1

revised

Update A1

check

b1

revised

Automatically updates B1

 

 

We already have quite a few elements in use, though we haven’t specified exactly what is valid. Let’s just note the “specification debt” and move on.

  • What can a cell hold? Empty string, other string, formula starts with “=”
  • What’s a valid address? Letter plus digits; ignore leading 0s; case-insensitive.
  • What’s a valid formula? So far, we’ve just used a simple cell reference, but we want operators too.
  • What happens when a cell has an invalid formula?
  • What happens when a cell refers to a cell containing a formula?
  • What happens when formulas form a loop?

 

We’ll pursue all these, but let’s start with formulas.

Formulas

Formulas can reference formulas.

 

SpreadsheetFormula

a1

b1

c1

d1

a1()

b1()

c1()

d1()

data

=A1

=B1

=C1

data

data

data

data

 

  

Formulas get more interesting when there are operators available. The reverse operator (‘) is probably a good one to start with.

SpreadsheetFormula

a1

b1

b1()

abc

=A1'

cba

abc

=A1''''

abc

 

 

 

The most useful string operator is probably append (+):

SpreadsheetFormula

a1

b1

c1

b1()

c1()

abc

=A1+A1

blank

abcabc

 

blank

abc

def

=A1+B1+B1+A1

def

abcdefdefabc

 

 

We have enough features that we can demonstrate an identity: (XY)’=Y’X’. We don’t have parentheses yet, but we can simulate this by putting the parts in separate cells.

 

SpreadsheetFormula

a1

b1

c1

d1

e1

d1()

e1()

abc

xyz

ignored

=A1+B1

=D1'

abcxyz

zyxcba

abc

xyz

=B1'

=A1'

=C1+D1

cba

zyxcba

 

  

Parentheses can be used to group operators. Let’s re-do the previous test, allowing parentheses:

SpreadsheetFormula

a1

b1

c1

c1()

abc

xyz

=(A1+B1)'

zyxcba

abc

xyz

=B1'+A1'

zyxcba

 

 The operator “>” tells whether one string contains another one. If the first string contains the second, the result is the second. If the first string doesn’t contain the second, the result is an empty string.

 

SpreadsheetFormula

a1

b1

c1

c1()

banana

ana

=A1>B1

 ana

banana

bab

=A1>B1

 

blank

 

We haven’t talked about precedence yet. The ‘ and () operators have the highest precedence, then +, then >. A1+B1+C1 is a legal expression, but A1>B1>C1 is not.

 

SpreadsheetFormula

a1

b1

c1

c1()

abc

xyz

=A1+B1'

abczyx

abc

xyz

=(A1+B1)'

zyxcba

 

 

SpreadsheetFormula

a1

b1

c1

d1

e1

e1()

abcdef

ghijkl

e

hgf

=A1+B1>C1+D1'

efgh

 

Backfill

We have several questions left open:

  • What can a cell hold? Empty string, other string, formula starts with “=”
  • What’s a valid address? Letter plus digits; ignore leading 0s; case-insensitive.
  • What happens when a cell has an invalid formula?
  • What happens when formulas form a loop?

 

The previous tests made a quick pass through the system. I think of them as generative: they help define the essence of the system. But questions like the above require us to fill in the gaps. I think of tests that do things like check “corner cases,” error cases, and how features interact as elaborative; they fill in what we already have. They might find problems, but they may well work already, depending on how the system was built.

What a cell holds

We already have test cases where a cell holds a string, and where a cell holds a formula, but it would be prudent to check that the operators work correctly on empty strings. If e is the empty string and x is a non-empty string, we expect:

            e’ = e

            e+e=e

            e+x=x

            x+e=x

            e>e=e

            e>x=e

            x>e=e

 

As I go to write the test, I realize that we never specified what a cell starts with. The answer, of course, is the empty string. So we’ll rely on that: A1 will be empty.

 

fit.ActionFixture

Comments

start

Spreadsheet

 

 

check

a1

 

Verify that cell starts empty.

 

Then we can verify those rules about working with the empty string:

 

SpreadsheetFormula

a1

b1

c1

c1()

Comment

blank

blank

=A1'

blank

e’=e

blank

blank

=A1+A1

blank

e+e=e

blank

blank

=A1>A1

blank

e>e=e

blank

abc

=A1+B1

abc

e+x=x

blank

abc

=B1+A1

abc

x+e=x

blank

abc

=A1>B1

blank

e>x=e

blank

abc

=B1>A1

blank

x>e=e

 

Valid Addresses

There are two places we use addresses: in the address field and in the cells with formulas. When we get a “real” (graphical) interface, the address will mostly be implicit. But even so, we’ll test it here just to be safe.

 

Let’s use a ColumnFixture for this: we’ll put address in one column, valid() in another, and standardized() in another. (A programmer will have to write the new fixture for us.)

 

The rules are: a valid address is a letter (A-Z, a-z) followed by one or more digits (0-9). Case is ignored. Leading 0s are ignored. “0” is not a valid row number.

 

Address

address

valid()

standardized()

A1

true

A1

a1

true

A1

A9874

true

A9874

Z1

true

Z1

z1

true

Z1

Z3992

true

Z3992

z3992

true

Z3992

AA393

false

 

zX202

false

 

é17

false

 

1

false

 

~1

false

 

~D1

false

 

y&1

false

 

^

false

 

X392%

false

 

H001

true

H1

j00010

true

J10

e000

false

 

A0

false

 

z0

false

 

 

Let’s make sure that case-insensitivity works in formulas:

 

SpreadsheetFormula

a1

b1

b1()

abc

=A1+a1

abcabc

 

Formula Errors

If a formula contains an error, we’d like it to display as “#error.” We’ll put all the invalid names from the previous table into formulas, and verify that formulas behave correctly. Then we’ll try various improper combinations of operators.

fit.ActionFixture

start

Spreadsheet

Create a new spreadsheet.

enter

a1

=AA393

Bad address

check

a1

#error

Marked as error

check

formula

=AA393

Formula as written

enter

a1

=A2

Change to valid address

check

a1

 

Make sure #error is cleared 

 

 

SpreadsheetFormula

a1

a1()

Comment

=zX202

#error

Two letters

=é17

#error

expected


? actual

Non-ASCII

=1

#error

No letters

=~1

#error

No letters

=~D1

#error

Unacceptable character

=y&1

#error

Extra character

=^

#error

No letters/digits

=e000

#error

expected


? actual

Too many digits

=A0

#error

expected


? actual

Invalid row #

=z0

#error

expected


? actual

Invalid row #

=

#error

Missing formula

  

Then we’ll get to some operators:

SpreadsheetFormula

a1

a1()

Comment

='A2

#error

'  should be postfix

='A2'

#error

Can’t be before and after

=A2+

#error

Need other term

=A3+A4+

#error

Need other term

=A2++A3

#error

Missing term

=A2+'+A3

#error

‘ isn’t a term

=A2'''+A3

blank

OK to mix things

=A2)

#error

Missing (

=(A2

#error

Missing (

=((((((((((((A2))))))))))))

blank

OK – big expression

=((((((A2+(A3))))+A4)

#error

Unbalanced – too few )

=(((A2>A3

#error

Unbalanced – too few )

=(A2>A3)))

#error

Unbalanced – too many )

=A2>A3>

#error

Can’t trail >

=A2>A3>A4

#error

Can’t repeat >

 

Loops

If a formula uses itself (directly or indirectly), we don’t want it to loop forever trying to figure it out. Instead, we’d like the display to be “#loop.”

 

SpreadsheetFormula

a1

b1

c1

d1

e1

a1()

e1()

=A1

blank

blank

blank

blank

#loop

blank

=B1

=C1

=F1+D1

=E1

no-loop

no-loop

no-loop

=B1

=C1

=F1+D1

=E1

=A1

#loop

#loop

 

Conclusions

This paper has demonstrated a set of tests using the fit acceptance testing framework. Some things to note:

  • The tests here have been written as if a customer specified them, without much demonstration of the programming cycle. But programmers can work with these tests in much the way they would with JUnit.
  • The tests are written without benefit of the feedback of a working system. (I wrote just enough code to make the first test not throw an exception.)
  • The tests look at only part of the system: the core functionality. There are other aspects of a real application that we aren’t testing. (For example, it may be non-trivial to connect a screen to the core code.)
  • Even a small application such as this requires a fairly large set of tests. With more programming work on the fixtures, we might be able to reduce some of the noise. Real applications will organize tests into multiple files, and will have to pay more attention to the challenges of consistency, test independence, and feature interaction.
  • It feels smooth to mix light natural-language specification with formal, executable tests.
  • Fit has a number of features we haven’t used.

 

I’ve heard that many teams use xUnit for unit testing, but still struggle to get customer tests before or even after stories are implemented. I hope frameworks such as fit can help lower the barriers to doing this crucial task. 

 

fit.Summary

counts 94 right, 4 wrong, 0 ignored, 0 exceptions
input file C:\P4\FirstFit\fit\FirstFit-in.htm
input update Thu May 01 10:51:42 EDT 2003
output file C:\P4\FirstFit\fit\FirstFit-out.htm
run date Thu May 01 10:58:28 EDT 2003
run elapsed time 0:00.14

 

Resources and Related Articles

 

[Written April 20, 2003; revised April 26, 2003, to correct mis-stated identity & in response to Ward Cunningham's great suggestions about improving the fixtures; 2012 – the WordPress version is designed to simulate the original look.]

Fit Spreadsheet

Ward Cunningham has created an acceptance testing framework known as fit. (See http://fit.c2.com for more details.) In this brief experiment, we'll use tests to help specify a simple spreadsheet for strings.

Starting Fit

To use fit, you create a web page that has tables in it; the tables specify tests. (There are other options but that is easiest.) In this case, I'm using Microsoft Word(tm) and saving the file in HTML format.

The fit FileRunner acts as a filter: given a web page, it copies text outside of tables as is, and runs your program on the table entries. Some table entries represent tests that can pass or fail; fit colors them green or red respectively. The output is another HTML file.

Fit will also put a summary in the file if you put in a table like this:

fit.Summary

With this tool, you don't manipulate screen elements directly. Instead, you work with an abstraction of them. To me, it feels like talking to somebody over the phone, trying to tell them how to use an application. ("In cell cee seventeen, put equals a one; then go to a one and type 'fish'.")

This article shows the input to fit; the result of running it is here.

Programming and Configuration Notes

Fit is a tool for customers and testers, but programmers will use it as well, and will have to write some of the fixtures the team uses. In this paper, I've tried to use the framework mostly straight out of the box.

The CLASSPATH needs to include fit.jar (both in the DOS window and the IDE). The runner command I'm using is:

java fit.FileRunner FirstFit-in.htm FirstFit-out.htm

When I do this on the file I have so far, it creates the output file and writes this to the console:

0 right, 0 wrong, 0 ignored, 0 exceptions

Fixtures

Tables in the input file have the name of a fixture in the first row. A fixture is a class that knows how to process the table. Fit comes with several fixtures built in, and programmers can create others.

One simple fixture is the ColumnFixture. In this fixture, the first row is the fixture name, and the second row has the names of data. If a name ends without parentheses, it is regarded as a field to fill in; with parentheses, it's treated as a method (function) call. The fixture fills in all the data fields, and then calls the methods to verify that they return the expected results.

Another standard fixture is the ActionFixture. This one consists of a series of commands. These include:

  • start classname: Creates an object of the specified class
  • enter field value: Sets the field to the value
  • press button-name: Calls the method corresponding to the button
  • check method value: Checks that the method returns the expected value

The ActionFixture ignores anything past the first three columns; we'll use the fourth column for comments.

So, we're finally ready to start our application.

fit.ActionFixture
start Spreadsheet Create a new spreadsheet.

This test doesn't ask for much, but of course it fails. (There isn't any code yet!)

            0 right, 0 wrong, 0 ignored, 1 exceptions

Programmer Notes

The exception is thrown because the Spreadsheet object doesn't exist. To create it as simply as possible, make it extend Fixture:

import fit.Fixture;

public class Spreadsheet extends Fixture {}

This gets us back to

            0 right, 0 wrong, 0 ignored, 0 exceptions

I've put together stubs for the fixtures used in this article: Spreadsheet.java, SpreadsheetFormula.java, and Address.java; here's a zip file containing all three.

A Few Stories

We have several things we want our spreadsheet to do:

  • Track the contents of cells
  • Distinguish data from formulas
  • Provide both data and formula views of cells
  • Support "+" for appending strings, "'" for reversing strings, "()" for grouping, and ">" for string containment.

Cells

The spreadsheet has a number of cells, each of which has an address. Cells contain string data or formulas.

We'll assume several screen elements:

  • a1 – the cell "A1". For "enter," we'll put something in the cell; for "check," we'll get its displayed value.
  • b1 – the same for cell "B1".
  • formula – the formula of the last-mentioned cell.

We'll start with a simple data cell.

fit.ActionFixture Comments
start Spreadsheet    
enter a1 abc  
check a1 abc Text in cell
check formula abc Formula is same. (Looks in last-mentioned cell.)

Now let's add in a formula cell. (Note that this table omits the "start" line; this means it's working on the same object as before. This lets us not repeat the setup, but it also makes the tests less independent.)

fit.ActionFixture Comments
enter a1 abc  
enter b1 =A1 Simple copying formula
check formula =A1 Formula is there
check a1 abc Original text in A1
check b1 abc Text was copied to B1

The essence of a spreadsheet is the automatic updates. Let's change A1 and see it happen.

fit.ActionFixture Comments
enter a1 abc  
enter b1 =A1 Simple copying formula
check b1 abc Copied value
enter a1 revised Update A1
check b1 revised Automatically updates B1

We already have quite a few elements in use, though we haven't specified exactly what is valid. Let's just note the "specification debt" and move on.

  • What can a cell hold? Empty string, other string, formula starts with "="
  • What's a valid address? Letter plus digits; ignore leading 0s; case-insensitive.
  • What's a valid formula? So far, we've just used a simple cell reference, but we want operators too.
  • What happens when a cell has an invalid formula?
  • What happens when a cell refers to a cell containing a formula?
  • What happens when formulas form a loop?

We'll pursue all these, but let's start with formulas.

Formulas

Formulas can reference formulas. We'll use a new ColumnFixture, SpreadsheetFormula, that lets us specify the inputs and expected outputs of cells. This fixture should access the same type spreadsheet as used by Spreadsheet.

SpreadsheetFormula
a1 b1 c1 d1 a1() b1() c1() d1()
data =A1 =B1 =C1 data data data data

Formulas get more interesting when there are operators available. The reverse operator (') is probably a good one to start with.

SpreadsheetFormula
a1 b1 b1()
abc =A1' cba
abc =A1'''' abc

The most useful string operator is probably append (+). Fit ignores input cells that are left blank, so we'll explicitly use the word "blank" when we want an empty cell. The fixture will have to take this into account.

SpreadsheetFormula
a1 b1 c1 b1() c1()
abc =A1+A1 blank abcabc  
abc def =A1+B1+B1+A1 def abcdefdefabc

We have enough features that we can demonstrate an identity: (XY)'=Y'X'. We don't have parentheses yet, but we can simulate this by putting the parts in separate cells.

SpreadsheetFormula
a1 b1 c1 d1 e1 d1() e1()
abc xyz ignored =A1+B1 =D1' abcxyz zyxcba
abc xyz =B1' =A1' =C1+D1 cba zyxcba

Parentheses can be used to group operators. Let's re-do the previous test, allowing parentheses:

SpreadsheetFormula
a1 b1 c1 c1()
abc xyz =(A1+B1)' zyxcba
abc xyz =B1'+A1' zyxcba

The operator ">" tells whether one string contains another one. If the first string contains the second, the result is the second. If the first string doesn't contain the second, the result is an empty string.

SpreadsheetFormula
a1 b1 c1 c1()
banana ana =A1>B1  ana
banana bab =A1>B1  

We haven't talked about precedence yet. The ' and () operators have the highest precedence, then +, then >. A1+B1+C1 is a legal expression, but A1>B1>C1 is not.

SpreadsheetFormula
a1 b1 c1 c1()
abc xyz =A1+B1' abczyx
abc xyz =(A1+B1)' zyxcba

 

SpreadsheetFormula
a1 b1 c1 d1 e1 e1()
abcdef ghijkl e hgf =A1+B1>C1+D1' efgh

Filling in the Gaps

We have several questions left open:

  • What can a cell hold? Empty string, other string, formula starts with "="
  • What's a valid address? Letter plus digits; ignore leading 0s; case-insensitive.
  • What happens when a cell has an invalid formula?
  • What happens when formulas form a loop?

The previous tests made a quick pass through the system. I think of them as generative: they help define the essence of the system. But questions like the above require us to fill in the gaps. I think of tests that do things like check "corner cases," error cases, and how features interact as elaborative; they fill in what we already have. They might find problems, but they may well work already, depending on how the system was built.

What a cell holds

We already have test cases where a cell holds a string, and where a cell holds a formula, but it would be prudent to check that the operators work correctly on empty strings. If e is the empty string and x is a non-empty string, we expect:

            e' = e
            e+e=e
            e+x=x
            x+e=x
            e>e=e
            e>x=e
            x>e=e

As I go to write the test, I realize that we never specified what a cell starts with. The answer, of course, is the empty string. So we'll rely on that: A1 will be empty.

fit.ActionFixture Comments
start Spreadsheet    
check a1   Verify that cell starts empty.

  Then we can verify those rules about working with the empty string:

SpreadsheetFormula
a1 b1 c1 c1() Comment
blank blank =A1' blank e'=e
blank blank =A1+A1 blank e+e=e
blank blank =A1>A1 blank e>e=e
blank abc =A1+B1 abc e+x=x
blank abc =B1+A1 abc x+e=x
blank abc =A1>B1 blank e>x=e
blank abc =B1>A1 blank x>e=e

  Valid Addresses

There are two places we use addresses: in the address field and in the cells with formulas. When we get a "real" (graphical) interface, the address will mostly be implicit. But even so, we'll test it here just to be safe.

Let's introduce a new fixture, Address. It will be a ColumnFixture: we'll put address in one column, valid() in another, and standardized() in another. (A programmer will have to write the new fixture for us.)

The rules are: a valid address is a letter (A-Z, a-z) followed by one or more digits (0-9). Case is ignored. Leading 0s are ignored. "0" is not a valid row number.

Address
address valid() standardized()
A1 true A1
a1 true A1
A9874 true A9874
Z1 true Z1
z1 true Z1
Z3992 true Z3992
z3992 true Z3992
AA393 false  
zX202 false  
é17 false  
1 false  
~1 false  
~D1 false  
y&1 false  
^ false  
X392% false  
H001 true H1
j00010 true J10
e000 false  
A0 false  
z0 false  

Let's make sure that case-insensitivity works in formulas:

SpreadsheetFormula
a1 b1 b1()
abc =A1+a1 abcabc

Formula Errors

If a formula contains an error, we'd like it to display as "#error." We'll put all the invalid names from the previous table into formulas, and verify that formulas behave correctly. Then we'll try various improper combinations of operators.

fit.ActionFixture
start Spreadsheet Create a new spreadsheet.
enter a1 =AA393 Bad address
check a1 #error Marked as error
check formula =AA393 Formula as written
enter a1 =A2 Change to valid address
check a1   Make sure #error is cleared 

   

SpreadsheetFormula
a1 a1() Comment
=zX202 #error Two letters
=é17 #error Non-ASCII
=1 #error No letters
=~1 #error No letters
=~D1 #error Unacceptable character
=y&1 #error Extra character
=^ #error No letters/digits
=e000 #error Too many digits
=A0 #error Invalid row #
=z0 #error Invalid row #
= #error Missing formula

 Then we'll get to some operators:

SpreadsheetFormula
a1 a1()

Comment

='A2 #error '  should be postfix
='A2' #error Can't be before and after
=A2+ #error Need other term
=A3+A4+ #error Need other term
=A2++A3 #error Missing term
=A2+'+A3 #error ' isn't a term
=A2'''+A3 blank OK to mix things
=A2) #error Missing (
=(A2 #error Missing (
=((((((((((((A2)))))))))))) blank OK – big expression
=((((((A2+(A3))))+A4) #error Unbalanced – too few )
=(((A2>A3 #error Unbalanced – too few )
=(A2>A3))) #error Unbalanced – too many )
=A2>A3> #error Can't trail >
=A2>A3>A4 #error Can't repeat >

Loops

If a formula uses itself (directly or indirectly), we don't want it to loop forever trying to figure it out. Instead, we'd like the display to be "#loop."

SpreadsheetFormula
a1 b1 c1 d1 e1 a1() e1()
=A1 blank blank blank blank #loop blank
=B1 =C1 =F1+D1 =E1 no-loop no-loop no-loop
=B1 =C1 =F1+D1 =E1 =A1 #loop #loop

Conclusions

This paper has demonstrated a set of tests using the fit acceptance testing framework. Some things to note:

  • The tests here have been written as if a customer specified them, without much demonstration of the programming cycle. But programmers can work with these tests in much the way they would with JUnit.
  • The tests are written without benefit of the feedback of a working system. (I wrote just enough code to make the first test not throw an exception.) When I went back to implement the system, I found a number of bugs in the tests.
  • The tests look at only part of the system: the core functionality. There are other aspects of a real application that we aren't testing. (For example, it may be non-trivial to connect a screen to the core code.)
  • Even a small application such as this requires a fairly large set of tests. With more programming work on the fixtures, we might be able to reduce some of the noise. Real applications will organize tests into multiple files, and will have to pay more attention to the challenges of consistency, test independence, and feature interaction.
  • It feels smooth to mix light natural-language specification with formal, executable tests.
  • Fit has a number of features we haven't used.

I've heard that many teams use xUnit for unit testing, but still struggle to get customer tests before or even after stories are implemented. I hope frameworks such as fit can help lower the barriers to doing this crucial task. 

fit.Summary

 

Resources and Related Articles

[Written April 20, 2003; revised April 26, 2003, to correct mis-stated identity & in response to Ward Cunningham's great suggestions about improving the fixtures. Revised May 1, 2003 to fix some test problems. 2012 – the WordPress version is designed to simulate the original look.]

Acceptance Tests for a Query System

Background

Suppose we're testing a library search system. The library has a database of book descriptions, and we give it queries. Each query produces a result set of the matching items.

When we consider this system, we might think of the database as a collection of records, and think of the query as a string. The query string represents a boolean expression, so we want to test its logical nature. Finally, the query string conforms to a grammar, so it might be well- or ill-formed.

Test Strategy

To get ideas on testing, we might go to a site like http://www.testing.com, and see Brian Marick's suggestions. Summarizing:

  • Collection: Test 0, 1, many, and duplicates
  • String: Test blank or not
  • Boolean: Test logical operators
  • Integer: Test 0, smallest, just below smallest, largest, and just above largest.

To this we'll add a test for "grammatical or not."

How will we define the tests? These tests work well with a spreadsheet. We can have one table for the database contents, and another for queries.

Data
Author Title Year
     
Queries
Query Test Value Comment
       

The semantics will be: load each line of the data table (after the first two), then run each query. Apply the test (with its optional value) to the result. The tests I have in mind are:

  • count – tell the number of items in the result
  • fails – query gets an error ("value" column unused)
  • contains – verifies that a particular phrase is in the result

We might find we need to extend this list later, but this is a good starting point. We won't even use "contains" in these examples.

Queries

Queries have a simple structure:

query = term
query = term op term
op = AND | OR | NOT

The precedence is left to right, so "fish or beef and steak" is interpreted "((fish or beef) and steak" (though the parentheses are not allowed). Matching should not be case sensitive; the operator names are not case sensitive either. Blanks are ignored.

The NOT operator may need a little explanation. In classical logic, not is a unary operator. In the search system, "a NOT b" is interpreted "a AND NOT b" (but the latter is not a legal expression in our query language). The NOT operator is used to eliminate unwanted terms. Thus, "extreme AND programming NOT sports" would match "Extreme Programming Explored" but not "Is Extreme Programming an Extreme Sport?"

Test Strategy Revisited

We'll start with the collection rules, and make four separate test databases: empty, one record, many records, and duplicates. For each of these, we'll try a variety of queries.

We'll test with a blank and non-blank queries, and we'll try grammatical and non-grammatical queries. We'll apply the logical operator testing rules (which we haven't covered yet).

For the integer values, there are really only two places that use integers: the number of records in the collection (which we covered), and the size of the result. Results can have anywhere from zero to the number of items in the database; only 0 and "max" apply from the testing rules. But we'll certainly hit a result set of size "1" as well as some intermediate values.

The Simplest Test: An Empty Collection

What can we learn from an empty collection? It seems like that isn't even worth trying–who would have a library with only one book? So: the data set is empty. (We may not even create the spreadsheet page for it.) Let's cover the test case for a blank or non-blank query string:

Queries
Query Test Value Comment
  count 0 Blank query
fish count 0 Non-blank query

Install this test and run it. That will show you the first reason to start with the simplest test: setting up the environment and running the test is non-trivial. You'd rather start with a small test set. In our example, there's a second reason it pays to start with an empty collection: we decided that a blank query returns a result with 0 items. We could have decided that it was an error. Murphy's Law tells you that if you didn't say, the programmers would have done the opposite of what you ended up wanting.

We can also test non-grammatical queries with an empty collection. It seems reasonable that error checking would happen before the query actually does its work, so it won't matter if the collection is empty or not. (And a quick chat with the programmers confirms our intuition.)

Queries
Query Test Value Comment
fish AND error   Nothing after AND
fish OR error   Nothing after OR
fish NOT error   Nothing after NOT
AND fish error   Nothing before AND
OR fish error   Nothing before OR
NOT fish error   Nothing before NOT
fish sticks error   No operator

We can certainly find other non-grammatical queries to try. Finally, we might toss in one legal but complicated and tricky one:

Queries
Query Test Value Comment
AND AND OR OR NOT NOT NOT count 0 Ugly but legal

One-Element Collection

We need a table with one record:

Data
Author Title Year
William C. Wake Extreme Programming Explored 2001

(What else would I use?)

You might start by repeating the earlier queries; they'll get the same answer. We can check that a record is found as expected for a simple query:

Queries
Query Test Value Comment
Explored count 1 Simple query
ExPLoRed count 1 Verify case insensitive

Brian Marick (www.testing.com) has suggested some good rules for testing booleans:

  • AND: try once with all conditions true, and once with each condition false and the others true
  • OR: try once with all conditions false, and once with each condition true and the others false
  • NOT: try once true, and once false

In our case, remember that our "NOT" operator is really "AND NOT", so for "a NOT b" we'll need to test (a true, b false), (a false, b false), and (a true, b true).

In the comments, TF/FT/etc. indicates which term is true and which is false.

Queries
Query Test Value Comment
Extreme AND Explored count 1 AND TT
Extreme AND sports count 0 AND TF
sports ANd programming count 0 AND FT
skiing Or skipping count 0 OR FF
Wake OR skateboard count 1 OR TF
dive OR wake count 1 OR FT
2001 NOT dancing count 1 NOT TF
dancing NOT singing count 0 NOT FF
2001 NoT extreme count 0 NOT TT

Note that we've checked a couple other things by having variety in the test cases: we've checked things in all three columns, and we've mixed case in both terms and operators. This can be beneficial or not. On the good side, it cuts down the number of test cases. On the bad side, it confounds two things. If "dive OR wake" fails, is it because matching is case-sensitive, or because there's a problem in "OR"?

You'll have to come to your own balance on this. If you find yourself unable to figure out what the problem is, it can be a sign to simplify your tests. This can be partially mitigated by keeping early tests simple, focused on one aspect, and let later tests build complexity.

We can explore a few other aspects of booleans. For one thing, there's a set of identity rules: "A OR A = A"; "A AND A = A"; "A NOT A = false". This shows up in our queries like this:

Queries
Query Test Value Comment
Extreme AND Extreme count 1 a AND a
Programming OR Programming count 1 a OR a
Explored NOT Explored count 0 a NOT a

We might want to verify that the term order doesn't matter. (We want "Programming AND Extreme" to have the same result as "Extreme AND Programming".)

Queries
Query Test Value Comment
Extreme AND Programming count 1 a AND b
Programming AND Extreme count 1 b AND a

Finally, we'd like to verify that operator precedence is right. Remember that "a OR b AND c" should be interpreted "(a OR b) AND c" rather than "a OR (b AND c)". (Some grammars for logical operators, and most programming languages, treat AND as higher precedence than OR; we'd like to make sure that doesn't bubble out to the query language.)

A few minutes thought (or a half hour of logical derivation:) will convince you that if "(a OR b) AND c" is true, then "a OR (b AND c)" will also be true. To show that the interpretation is wrong, we'd have to find a case where the second one matches but the first one doesn't. This case exists when A matches, but B and C do not.

Queries
Query Test Value Comment
Extreme OR Sports AND Agile count 0 precedence

We're definitely hitting some subtleties now. What happens if we don't test this case? Are we doomed? What about the many other ways queries could be wrong?

You always have to draw a line somewhere. It took me about an hour to convince myself that the precedence problem could exist, to come up with an example, and to prove that the example could be logically derived. If I hadn't done this, there's a chance this bug would surface. The biggest danger is that the programmers would misunderstand the specification, and assume "normal" precedence. But as customer, I'm in the room with them talking about it. If we had just missed this subtlety, note that this is an obscure case in the grand scheme of things: I've tested one- and two-term queries more thoroughly, and those are by far the most common ones users will form.

That doesn't completely excuse the potential problem. If I were worried enough, I could get someone to write a small program that generated "all" queries with 3 or 4 terms. ("All" meaning all combinations of operators and terms present or not, not "all" possible terms.) "a op b op c op d": Each term might be present or not (2^4=16), and each operator could have one of 3 values (3*3*3), so there are 16*27=432 combinations I might look at. This is an uncomfortable number, but not unbearable. In normal use, I'd probably just pick a handful of problematic-looking ones, and trust the result.

Collection with Many Elements

We'll define a bigger collection, and repeat our queries. (They'll have slightly different result counts, but none will go from 0 to a positive number or vice versa.)

Data
Author Title Year
Kent Beck Extreme Programming Explained: Embrace Change 1999
Kent Beck and Martin Fowler Planning Extreme Programming 2000
Martin Fowler et al. Refactoring 1999
Ron Jeffries et al. Extreme Programming Installed 2000
William C. Wake Extreme Programming Explored 2001
Queries
Query Test Value Comment
Explored count 4 Simple query
ExPLoRed count 4 Verify case insensitive
Extreme AND Explored count 1 AND TT
Extreme AND sports count 0 AND TF
sports ANd programming count 0 AND FT
skiing Or skipping count 0 OR FF
Wake OR skateboard count 1 OR TF
dive OR wake count 1 OR FT
2001 NOT dancing count 1 NOT TF
dancing NOT singing count 0 NOT FF
2001 NoT extreme count 0 NOT TT
Extreme AND Extreme count 4 a AND a
Programming OR Programming count 4 a OR a
Explored NOT Explored count 0 a NOT a
Extreme AND Programming count 4 a AND b
Programming AND Extreme count 4 b AND a
Extreme OR Sports AND Agile count 0 precedence

We'll also add a new query to cover the case "all records are in the result set." We had this for the one-record case, but we'd feel better seeing it on a bigger set as well.

Queries
Query Test Value Comment
Programming OR Fowler count 5 Find all

Collection with Duplicates

Verify that duplicated entries are treated the same. We decide we won't "de-dup" but rather just return what we find.

Data
Author Title Year
Kent Beck Extreme Programming Explained: Embrace Change 1999
Kent Beck Extreme Programming Explained: Embrace Change 1999
Kent Beck and Martin Fowler Planning Extreme Programming 2000
William C. Wake Extreme Programming Explored 2001
Martin Fowler et al. Refactoring 1999
Ron Jeffries et al. Extreme Programming Installed 2000
William C. Wake Extreme Programming Explored 2001
Kent Beck Extreme Programming Explained: Embrace Change 1999
Queries
Query Test Value Comment
Extreme count 7 Retain dups
Extreme OR Refactoring count 8 Find all with dups
Refactoring count 1 Find 1 with dups

Conclusion

We've created an initial set of tests for a simple query system that might show up in library software. Several things stand out:

  • A spreadsheet is a reasonable tool for capturing this type of test.
  • Some simple test rules (from www.testing.com) provided guidance on how to create test cases.
  • Notice that we have both the test query and its expected result.
  • Many tests are straightforward, but some of them bring out real issues in the specification.

Sidebar: Running the Tests

A spreadsheet provides the test creator with a comfortable, well-understood interface, and it can write easily processed files.

For the example we used, the tester and the programmer have to agree on the columns and their meanings, and the file format. When the tester is done writing a file, they can save it into a tab-delimited or bar-delimited file (whatever they agree on with the programmers). The programmers can write a simple program that will read the file and do the appropriate function. The programmers might generate code, or just have the program read and execute functions directly.

I think of the GUI (graphical user interface) as being lopped off or set aside, and the spreadsheet as providing an alternative interface. It's often best to focus on feature testing first, without worrying about the GUI until the features are right. A spreadsheet is nice for this approach: the tester still gets a GUI, just not one customized to the problem at hand.

Resources and Related Articles

[May 25, 2001.]

Acceptance Test Mechanisms

An acceptance test is a test that the user defines, to tell whether the system as a whole works the way the user expects. Ideally, the acceptance tests are defined before the code that implements the feature.

Acceptance tests are run frequently in an XP project, usually daily or more often, and certainly every week. The tests give a picture of how much of the desired functionality will work. Because the tests are run so often, there is a payoff if they can be automated.

Progress on acceptance tests is one of the crucial metrics that XP projects often collect.

There are several mechanisms that can be used to run acceptance tests:

  • Manual Test
  • GUI testing tools
  • Code
  • Script
  • Spreadsheet
  • Template

Manual Test

The simplest mechanism for running a test is to just make a plan on paper, and then do the steps manually.

GUI Testing Tools

There are commercial tools that let you run a system, recording your activities and the system's responses. You run it once to establish a "golden" copy, and then automatically after that to verify that things haven't changed.

This might be useful for some parts of testing, but it's not as good as you might think.

  • If the interface changes, the tests are invalidated. (The tools' exact approach determines how significant an interface change will trigger this problem.)
  • The interface is often (usually?) not the core of the problem being solved. Thus it might be the case that interface is done toward the end of the project. All of the functionality should be tested along the way.

Code

In the code approach, you get a programmer to write the code that will run the tests you specify. This code is usually straightforward. Some teams use frameworks such as JUnit to manage and run the acceptance test code.

Script

"Script" is a simplified form of code. Programming languages typically have several ways to repeat actions, to do something if a condition is or isn't true, and so on, but most test code doesn't need these more complicated parts of programming.

So, programmers might create a scripting language. Tests are usually stylized, simple code, and the programmers will provide a way to run them. The language might be a subset of the one the programmers are using (e.g., Java), or a more "scripting" language like Perl, Ruby, or Python.

Example:

doc("Extreme Programming Explained", "Kent Beck", 1999);

doc("Planning Extreme Programming", "Kent Beck and Martin Fowler", 2000); expect("Extreme and Programming", 2); expect("Explored", 1); expect("Fish", 0); expect("Explored or Planning", 2);

 

 

 

Downsides: Script languages still require a lot of attention to details, and may have unwanted complications. Even in this simple example, there are several issues that the test writer has to be careful about:

 

  • Don't forget the semicolon at the end of each line.
  • How do you put quotes inside quotes?
  • Numbers vs. strings

 

Benefits: It's not all bad news:

 

  • It's easy to get started (by adapting an existing language)
  • Gives the user direct ownership of the tests
  • The solution is flexible.

 

Providing a scripting language is often the easiest way to get a user writing tests.

 

Spreadsheet

 

When you get to their essence, many tests can be represented in a spreadsheet. This gives the user a well-understood way to create, edit, and manage tests.

 

Example:

 

Title Author Year
Extreme Programming Explored Kent Beck 1999
Planning Extreme Programming Kent Beck and Martin Fowler 2000

 

 

 

Query Expected Comment
Extreme and Programming 2 AND found
Explored 1 No booleans
Fish 0 Term not present
Explored or Planning 2 OR found

 

Spreadsheet formats are not hard for a program to handle. Spreadsheets almost always have a way to save files in a format where commas, tabs, or "|" bars separate the fields. Then the program just reads one row per line and does what is needed. Different types of tests might use different programs, but still work with the spreadsheet approach.

 

Template

 

As you grow a bunch of different tests, it becomes more of a pain to write a new small program for each test type. You can use a "template" approach that combines scripting and spreadsheets.

 

The user writes the script that would test one row, and then "templatizes" it by replacing data values with a generic name. Then one program can create the actual script by merging the template and the spreadsheet.

 

Example:

 

*{
test($Comment);
expect($Query, #Expected);
}*

In this made-up example template, the part contained in the "*{ }*" brackets is repeated once per spreadsheet row.

Conclusion

There are several options for creating and running acceptance tests. Each team will have to decide which option (or combination) will work best for it.

Resources and Related Articles

[May 15, 2001.]