Tag Archives: atdd

Agile '06 Workshop: Example-Based Specifications

This workshop was designed to explore Example-Based Specifications. Hosted by Brian Marick and Bill Wake.

Schedule:

  • 1 hour – One person plays customer, the rest of the team writes tests
  • 30 minute debrief
  • 1 hour – Small group discussions
  • 30 minutes – sharing and conclusions

Test Writing

Group 1: Domain – a statistical tool that evaluates inputs according to a formula, and plots the results.

Their test:

(A, B, and Y are inputs; b0, b1, b2, and b12 are outputs, produced by a process of matrix algebra.)

A B Y b0 b1 b2 b12
10 75 8 11.0 5.0 8.0 12.0
20 75 12
10 100 10
20 100 15

This is a textual representation of the graph

b0 b1 b2 b12 B-1 B-2 B+1 B+2
0 5 8 12 (1,1,RGB) (10,10,RGB) (10,1,RGB) (1,10,RGB)

Their expected graph (described here, though the team drew a picture):

  • a large X with each line in a different colors
  • labels +1 and -1 in the graph
  • a horizontal axis labeled "A" (also having labels -1 to +1)
  • a vertical axis ("Y")
  • a graph name ("B").

Group 2: Domain – a session scheduler, matching audiences to rooms

Test: Schedule a session -

Preconditions: user has logged in, room size < audience size

  • TU35 has iaudience 100
  • Room NICA has capacity 25
  • TU35 is not scheduled
  • NICA is available Monday 11-12:30

Execution:

  • User assigns TU35 to NICA at Monday 11-12:30
  • System displays a warning, "Room size mismatch… Continue or cancel?"
  • User selects "Continue"

Expected Results:

  • TU35 shows up scheduled in NICA, Monday 11-12:30

Group 3: Domain – a graphical tool for tracking and identifying criminal conspiracies. (Tool helps build a network of connections between people).

Their test:

Select template criminal
Create node
Create node
Select link template ?
Create link
Check count links 1
Delete node 1
Check count links 0
Create node policeman
Set first name Tom
Set last name Nassau
Create criminal
Set last name Soze
Select link template arrest
Create link Nassau to Soze
Check link exists
Select template policeman
Create node
Add role
Select node 1
Check has role policeman
Check has role drug dealer


Group 4: Domain – an expense approval manager

A GUI mockup:

    Manager approval:
    Purpose:

Type Date Description Amount Needs Receipt
    American Airlines SEA to MSP $600.00  
       

X

         

Issues: editing, are you sure?, manager approval, expense group

Test:

Type Date Vendor v Description Amount
Air tix 7/16/06 AA ORD to MSP 763.42

    Purpose: Trip to agile conference
    Manager approval:    Pointy-haired boss

An analysis of states:

New – Open, can’t pay, can’t approve, can’t close
Submitted – Open, can’t pay, can approve, can’t close
 

  Is Open Can Pay Can Approve Can Close
New        
Submitted        
Approved        
Paid        
Closed        

 


Group 5: A stock trading program

A screen mockup with annotations:

Start Time [   ] v : [   ] v 9:30-16:00 ET (market start to market end)
End time [   ] v : [   ] v
Stock Symbol [    ] 1-4 alpha
# Shares [    ] 1
Order Size [    ] 100->1 million (int) (+- 100)
  [Buy/Sell]  
Price $ [    ] (optional) numeric + 2 decimal optional
     
  [OK]    [Cancel]  

Test of algorithm:

  • Start time >= now >= 9:30 EST
  • End time > start time <= 16:00 EST
  • Distribution list (file) exists
  • Test invalid symbol entry
  • Test for valid symbol entry
  • Test all shares are traded if no limit price
  • Test trades don’t violate limit price
  • Buys or sells based on selection

Group 6: Insurance tracking program

Test for story "Insurer adds provider":

Test 1 – Sunny day

Add table:

Johnson, David name 15ch, 15ch
1200 Nicolette Dr addr1 40ch
12345 addr2 5d or 5d-4d
123456789 taxid 9 ch num
ALLINA network id must exist in db

Read: (same)

Test 2- Invalid zip code (and more like this…)

Johnson, David name 15ch, 15ch
1200 Nicolette Dr addr1 40ch
1234 addr2 5d or 5d-4d
123456789 taxid 9 ch num
ALLINA network id must exist in db

Expected error: "Invalid zip could should be 5 digits"

Test for story "Analyst approves pending claims < 60 days since claim made"

Test 1 – Sunny day

Populate database – set up claim 1

Date of claim 30 days ago
Member name Doe, Jane
Provider name Johnson, David
Diagnosis 769
Charge $500.00
Network id ALLINA

Step 1. Analyst views pending claims < 60 days (display claim A => OK)

Step 2. Analyst approves pending…

    Signal "ok" = submit
    Claim disappears from list
    Message "approved"

Step 3. Read claim, status "approved"


Group 7: Domain: Shipping company.

Test: 1. Customer ABC call to know where shipment with order #33 is. The system should answer, "Last stop was Tampa, FL."

 

Note Order # from Customer Request Answer from System
Truck left origin #33 last stop? Tampa, FL
Not in truck #34 last stop? nothing
Truck at destination #35 last stop? Daytona, FL
Truck at destination #35 arrived? yes
Not in truck #34 arrived? no
Not arrived #34 expected date? 26/7
Arrived #35 expected date? 25/7
Truck underway #33 arrived? no

Example context:

# Pick up origin Final drop destination Expected Date
#33 Tampa, FL Toronto, ONT 27/7
time 24/7 8 AM  
#34 Vancouver, BC Toronto, ONT 26/7
time  
#35 Toronto, ONT Daytona, FL 25/7
time 24/7 17:00 time 25/7 10:00

Test 2: I want the system to help me minimize empty truck displacement. For example, I want to be able to ask if there is an empty truck in Ontario on July 27. Arrival within 2 days.

Empty truck in? Shipment order  
Ontario/27/07 #33, #34 truck at city
Ontario/25/07   no truck because wrong date
Montreal/QC   no truck because wrong location

Debrief

We noticed these things from the experience of writing tests:

  • What to do is vague.
  • Developers tend to embellish.
  • Tests teach developers, but it’s a challenge to pick the right level.
  • It’s hard to turn descriptions into tests.
  • Tests (and collaboration) can help you discover new things.
  • The idea of a GUI became actions on a model (for the test).
  • Customers should come in prepared.
  • We need many questioners per answerer.
  • Someone came in late, and found that the examples helped them understand.

Topics

In small groups, we explored these topics:

  • Sufficient coverage: How do you know you’re done – what’s sufficient coverage?
  • Test styles: What is the relationship between example-based specifications and other styles of tests?
  • Product Owner: Techniques for interviewing product owners

Sufficient Coverage

  • When adding tests to a legacy system, how much do you "backfill"?
  • Should we just use "change detector" tests (record what the system currently does, knowing that it may change later, and may not even be correct).
  • Do we need all combinations?
  • Where do we fit example-based tests?

Test Styles

We suspected these things about example-based tests:

  • They may bring about exploratory testing, regression tests, better unit tests, acceptance tests. Load tests?
  • They provide a concise account of an edge case.
  • They serve to train new developers in a domain.
  • They provoke a certain style of conversation.
  • They may overwhelm developers with distracting detail (no "metaphor").
  • Alternatively, developers may ignore examples! It happens…

Product Owner

  • Use Sophie’s Rule – keep asking "why?"
  • Ask the end goal, define the business problem, define acceptance criteria
  • Discuss requirements "as if you were blind" (without reference to UI)
  • Need a customer who is is readily accessible
  • Helps to talk to end users
  • Programmers need to understand the domain
  • Let product owners ramble

General Notes / What Next

  • Mind maps have utility in these conversations.
  • Examples are a style of conversation – the easiest kind to get.
  • "We’re going to practice" – "process miniature" exercises could help.
  • A whiteboard and pointing are powerful ways to focus attention.

Extreme Test Makeover – Agile ’06

Brian Marick and I hosted "Extreme Test Makeover," where people could bring their laptops with code and tests, and have an experienced tester/programmer review them. Observations by participants:

  • Watij tests in Fit are too long/confusing to read for customer.
    • You could write it in JUnit instead of Fit
    • Break them up into small focused tests
  • Neat new delegate syntax (with .Net 2.0)
  • Descriptive variable names are good [even] for short term variables.
  • Keep tests focused on one purpose – If a test needs 3 things to work, create 3 tests
  • Generic isn't always useful.

Thanks to our participants and our experts: Bob Martin, Brian Button, Janet Gregory, JB Rainsberger, Jim Newkirk, Jim Shore, Lisa Crispin, Micah Martin, Randy Coulman, and Ward Cunningham.

Review – Fit for Developing Software

Fit for Developing Software, by Rick Mugridge and Ward Cunningham.

[My bias disclosure – I know both Rick & Ward, I was a reviewer, and I’ve written for their publisher myself. This review is substantially as posted on the agile-testing group.]

Fit (see http://fit.c2.com) is a testing framework that Ward Cunningham developed. A test author writes tests as tables in a document that can be converted to HTML (e.g., Word, Excel, text editor, etc.). The programmers develop fixtures that connect to the system under test. Fit mediates the tests and the fixtures to run the tests, and captures the results. It colors in the document using red/yellow/green to show what happened.

The book is in two halves. The first half is targeted at test authors: from the user perspective, how does Fit work? It covers the basics of tables, fixtures, error handling, and so on. Then it goes into an extended example (several chapters) following a team developing rental software. The first half closes with advice about designing better tests.

The second half is targeted at programmers. While programmers should really read and understand the first half of the book, test authors will probably at most skim this half. This half starts by explaining how to implement various types of fixtures. Then it continues the earlier rental software example by showing the fixture code that would be developed. Finally, this half closes with some advanced topics: Fit’s architecture, custom fixtures and runners, and model- based test generation.

The authors have done a good job explaining Fit from both the test- writing and programming perspectives. The text is clearly written, using plenty of examples, frequent breaks for Questions and Answers along the way, and exercises at many chapter ends.

This book is unique. While you can find information about Fit and fixtures on the web, what’s on the web is much less readable than what this book provides. The book also gives you an extended example and helpful advice from two experts.

If you are considering Fit, or just want to understand its philosophy, this book provides the clearest explanation I’ve seen. For test authors, the first part of the book justifies the whole price. For programmers who need to understand how and why fixtures work, it’s even more of a bargain.

Fit for Developing Software, by Rick Mugridge and Ward Cunningham, with a foreword by Brian Marick. Prentice Hall, 2005. ISBN 0-321-26934-9.

Fit Reading (Part 8 of 8) – RowFixture

A RowFixture is used to test that a set of items is as expected. The fixture flags surplus or missing items. They look like this:

MyRowFixture
first last status()
Alexander the Great ok
Alexander the Mediocre unknown
Winnie the Pooh lagging

Each row represents a domain object of some sort. The columns have inputs and outputs, as for ColumnFixtures.

Data and Abstract Methods

First, I'll note that this fixture extends ColumnFixture. This lets it pick up bind() and check(). The former handles the "header" row; the latter makes sure execute() is called. But due to the way the overrides happen, that method is called under different circumstances than for ColumnFixture. I don't see anything that will call reset() on a per-row basis.

The fixture holds three bits of data: an array containing the results of the query, a list of surplus items, and a list of missing items. From showing usages, I see that the list of surplus items is a list of domain objects; the missing list is a list of Parses.

The first abstract method is query(). It is responsible for producing an array of the "actual" results. The second abstract method is getTargetClass(). It returns the class object representing the type of the row. It's abstract for an interesting reason: the parent class ColumnFixture defines that method to return the type of the fixture itself. That would just lead to weird errors. By making it abstract, it forces the user to override it.

This is an interesting twist – usually my abstract methods are at the top of the hierarchy, and may get filled in along the way down. In this case, the method is becoming abstract in the middle.

In a sense, that happens because RowFixture and ColumnFixture have a slightly strained relationship. Maybe I'm just not getting why the latter is an example of the former; it feels like the inheritance is more for implementation than anything.

doRows() – The Overall Algorithm

The main algorithm is in doRows(): bind the columns (ala ColumnFixture), run the query() to get a list of actuals, run a match(), then add rows for any surplus or missing items.

Along the way, this method calls two overloaded list() methods: one for making a list of Parses, the other for making a list of objects. This parallel structure (methods for each of the two main data structures) continues across the class.

Method doRows() calls buildRows() to add in new rows for the surplus values. This method works by building up a "fake" head of a parse chain, then adding each item to the last one in the list. In the end, it throws away the head and returns the interesting part of the list, which gets attached to the end of the table. This seems like a little pattern worth remembering if I ever need to add rows to a fixture.

match() – The Heart

The match() routine is a recursive algorithm. Given the list of expected items, the list of computed items, and a column to start looking in, it figures out what matches and what's missing or surplus.

Since this algorithm looks complicated, I'm going to start by just looking around. First, what are the places that call it? By doRows() certainly, since that's how we got here. Then it's called recursively at two places inside match() itself. The good news is we don't have some sort of pair of mutually recursive methods.

Recursive algorithms have a base case and a recursion case. The recursive case here is just incrementing column, and passing along lists. The column is always incrementing, and the first if says that if we've exceeded the number of columns, we should just do a check on the lists. That makes it look like we'll always terminate: we either increment column, in which case we'll eventually stop, or we do something that doesn't recurse, which will stop as well. (Or rather, if it doesn't, it won't be because of the recursion here.)

The other thing to look at on these recursive calls is the lists. We know the column gets bigger – do the lists get smaller? One case passes on the originals, so we know it's no bigger. The other is trickier – I see things that seem to indicate that the lists will shrink (tests for 0 or 1 item), but it's not obvious that it must be so without a little digging.

So, from the top of match(): the first case we mentioned before – if we're past the number of columns, do a check() on the currents lists. (We'll come back to that method later.) The second case says that if the current column binding is null, move on to the next column. The third and final case is where the meat is: we're in a column in the middle, trying to match.

So, we build up two maps: one for the expected, one for the computed ("actual"). Each map has a list of items that have the given value at the chosen column. We pull out the keys in either list, and work our way through them. Here, there are four interesting cases:

  1. The expected list is empty – we have a value found only in the computed list, so add it to the surplus list.
  2. The computed list is empty – we have a value found only in the expected list, so add it to the missing list.
  3. There is only one value in each list, and they have the same key (by how we got here) – check this row (actual vs. computed).
  4. Finally, both lists have more than one item with the same key value – recurse, but only on the list of items with this same key value. (Now I can see that I'm recursing on a list that's no bigger. It could be the same size if all the keys are the same.)

I'm left to wonder – does this work as a set or a multi-set? That is, can we have completely duplicate rows if the "same" object is in the list twice? I'll come back to this.

eSort() and cSort()

There are two "sort" routines, one for expected and one for computed. They're pretty similar, so I'll describe them together. (I don't get why they're a sort of any sort, though.)

Each routine produces a Map, from key values to a List of Objects (either domain objects or Parses). The bin() method takes care of putting items in the map. That method expands an Array into a List; the RowFixtureTest mentions a bug in that neighborhood and I suspect this is to address that.

The sort() methods handle exceptions and rows with no value in a particular cell.

Back to the Algorithm

I think I understand what's going on enough to put it into words now. To make a match, we start in a given column. Each list gets divided into buckets, based on the value of the cell at that column. If buckets have 0 or 1 item, then we have a good enough match. Otherwise, we'll look at more columns for those items. Eventually, we'll run out of items or columns.

check()

The last interesting bit is around the check() routine. It goes through and checks the columns one at a time, using the normal TypeAdapter facilities. The routine is recursive: it peels off the front of each list, recursing until one or both lists is empty.

Leftovers and Learnings

I had a question about whether it acted like a multi-set or a set. It looks like it's basically multi-set-like, from a simple test with a list of integers.

The other big thing for me to wonder is how I'd have done a similar fixture. I think I'd have expected a simple set. The problem is, that's fine for the query() side, but not so good for the "expected" side: how would you get from those values to construct the objects to compare as sets? (Knowing the contents of an object's fields and return values from its methods doesn't tell you how to construct it.)

Another alternative would be to get the query values, and match each one up against the rows. The naive algorithm for this is a little slow (n^2). It might be a bit simpler. I suspect its report wouldn't be as nice.

The current algorithm is able to take advantage of partial matches – if enough data cells make it a unique match, it can then know it has the "right" element even though some of the fields/methods are wrong.

Closing Out…

That concludes my tour of Fit. I focused on the main fit package, skipping a couple more minor classes. The code reading was a good exercise for me – I have a better sense of some of the tradeoffs in the code, and of the dynamics in the Fit community.

Fit Reading (Part 7 of 8) – ColumnFixture

ColumnFixture is an easy fixture to understand from the user's point of view: each row is a test case, with some columns being inputs, and others being outputs:

MyCalculatorFixture
x y plus()
0 2 2
1 1 2

doRows() – Capture the Header Row

Method doRows() calls bind() to peel off the header row, then processes the rest of the table. Bind() creates an array of TypeAdapters, one per column in the table. If the column header cell is empty, the ColumnBinding is set to to null. (An opportunity for a Null Object? Later, we check for null.)

If the cell ends in "()", it's a method call, and set via bindMethod(). Peeking inside there, it camel cases the name (so "shirt size()" becomes "shirtSize()") and uses the TypeAdapter.on() method to create the adapter.

Otherwise, the cell is assumed to be a field name, set via bindField(). That helper method also camel-cases the name and uses the "field" version of TypeAdapter.on().

Any problem in the header parsing marks the cell with an exception.

doRow() – Handling the Basics

DoRow() is fairly simple. It calls a stubbed-out reset() method before handling anything in the row. This lets a fixture reset anything on a per-test basis. It lets the normal row-processing take place, then it checks if execute() has been called; if not, it calls it.

Execute() is to be called before processing the first column that represents a method name: you can have a bunch of inputs, use execute() to make things happen, then check the outputs. If you don't override execute(), then you either have to know which column is first and make it kick things off (which is brittle), or you let each output column compute "from scratch". (Reading it here makes me wonder if I've been diligent about this in all my ColumnFixtures:)

If there are any unhandled exceptions, they're attached to the first leaf cell of the table.

doCell() – Per-Cell Handling

This boils down to four cases: empty text, null TypeAdapter (mentioned earlier), field, and method. For empty text, we call execute() and move to the superclass' handling of the cell, which marks the cell as "info". This seems a little odd to me – why should a missing value trigger that?

I'm going to write a test for it, but it took a couple minutes to figure out how to do so. I think what I'll do is create a table with a default value for x and y, leave x blank, and have execute() print the value of y. If it's called when x is processed, it'll print the default value rather than the one that was set.

(Pause)

OK – I'm back, and it does act like I expected from the code – execute() is called for a blank cell. I guess that makes sense to do if the blank cell were a call cell (like plus()); I'm not clear on the value for an input cell.

Back to doCell(): if the TypeAdapter is null, it ignores the cell. If it's a field, it parses the text and sets the field. If it's a method, it calls check().

check() – Calling execute()

Method calls go through the (overridden) check() routine. This is really here to make sure the execute() method gets called if it hasn't yet. Then it just defers to the superclass version, which calls the method and compares the result.

What I've learned

  • Don't forget about reset() and execute()
  • Execute() is a little weird in the face of blank input cells.

Semantics of Fit: A Path Toward New Tools

Fit's standard interpretation tells us how well a program does against a set of test cases. We can design new semantics for reporters (that give us interesting information) and for rewriters (that make interesting transformations of our tests).

The Standard Interpretation

A Fit test looks like a table (or a series of tables) in an HTML document:

Calculator
x y plus() times()
2 1 3 2
0 3 3 0
2 -1 1 -2

 

fit.ActionFixture
start ScientificCalculator  
enter value 2
press plus  
enter value 1
press equals  
check value 3

But what does a Fit test mean?

This is almost a trick question. One answer is: whatever people agree it means. Another answer is: whatever the Fit framework, the fixtures, and the application make it mean. These two things don't always line up, unfortunately.

Even when we think we're talking about the same interpretation, there can be differences in what a test means. One version of the Calculator fixture could work with the business objects; another could work at a presentation layer; another could work by manipulating the user interface. These fixtures might find different types of defects; they might be easier or harder to write; and so on.

These all generally assume a standard interpretation: use Fit to run the fixture on the table. The good side of this is that it gives us something we care about: an answer to the question, "How well does our program work?"

I'd like to move to a different question:

What can we know about our tests if we don't understand the fixtures?

Alternative Semantics

A different interpretation of the directory containing my test above might yield this table as a result:

Index
Fixture File Table Number
Calculator MyTest.html

1

Calculator OtherTest.html 3
fit.ActionFixture MyTest.html 2

This is a cross-reference chart, showing where every fixture is used. I can get this information with no reference to the fixture implementation at all: it's just an index of the first cell of each table.

We can find out other interesting things with just a little knowledge of fixtures. For example, if we know that "Calculator" above is a ColumnFixture, then we know that the first row consists of input and outputs. Even without knowing anything about how the fixture is implemented, we can create another interesting table:

Vocabulary
Fixture Field Value
Calculator plus() 1
Calculator plus() 3
Calculator times() -2
Calculator times() 0
Calculator times() 2
Calculator x

0

Calculator x

2

Calculator y -1
Calculator y 1
Calculator y 3

This table gives you a picture of the input and output domains. A good tester could look at this and notice all kinds of things that weren't tested:

  • no large or small numbers (in either the input or output)
  • no non-numbers
  • no sums that resulted in 0
  • no sums that were negative
  • no sums that were even
  • only even numbers for x, odd numbers for y
  • minus(), divide()
  • etc.

We could create a tool that would give us a domain analysis of our test data:

Test Data for Calculator
Field Max Neg Neg Zero Pos Even Odd Max Pos
plus() no no no yes no yes no
times() no yes yes yes yes no no
x no no yes yes yes no no
y no yes no yes no yes no

 

Reporters and Rewriters

Two types of interpretations stand out for me:

Reporters – tell you something about a set of tests

Rewriters – change a set of tests

The Index and Vocabulary examples above are reporters. The standard Fit interpretation is close to being a rewriter: it produces a version of the input file, modified to show the results of the tests. (The only thing that keeps it from being a true rewriter is that it leaves the original file in place, so you can run it again.)

Here are some more ideas for useful tools:

  • Count test cases- A reporter telling the total number of test cases for a fixture
  • Count fixture – A reporter telling how many times a fixture occurs
  • Operators – A reporter telling the column names in RowFixtures and ColumnFixtures (or the second column of ActionFixtures)
  • Operands – A reporter telling the data values used (like "Vocabulary" above)
  • Column changer – A rewriter that can rename, delete, or insert a column
  • Cell rewriter – A rewriter that can change cell values (for a specific fixture in a specific column or row)

(I've created a simple version of the Index example above using the AllFiles fixture; the others are speculation.)

Reflection is OK

There are useful semantics that don't try to interpret a fixture, but it can help to "peek" a little. For example, knowing that something is a ColumnFixture tells us that it's likely that the row after the fixture name consists of input and output fields. We can use this information fruitfully. The Vocabulary example above made use of this knowledge.

Furthermore, there is nothing wrong with getting help. If someone had a new type of fixture that subclassed Fixture, but still had ColumnFixture-like semantics, they could provide a helper analysis class that would let us know this.

The goal is not to avoid using fixture-aware code, it's just to avoid the quagmire of trying to interpret another program.

Call to Action

We've had a few years to work with Fit. People are creating test suites large enough to be interesting, and large enough that they need help managing them.

It's time to experiment with new interpretations of Fit tests. (We still may use Fit to help with this task.) The need is there now, by real people doing real work.

[June, 2005.]

Fit Reading (Part 6 of 8) – TypeAdapter

C# Fit

I’ve gotten some mail letting me know that the C# Fit has forked a bit – there’s a newer version that’s the regular Fit distribution, and an older/modified version that’s part of Fitnesse. I was having trouble extending the Fitnesse version. There’s an effort to do some unification work this summer; that should help.

TypeAdapter

TypeAdapter exists to give a common interface to types, so they can all have setters, getters, and parsers. There are three factory methods, all named on(): one takes a fixture and a class, another a fixture and a field, and a third takes a fixture and a method.

This gives the unification of fields and methods. In Fit, you can have a ColumnFixture with a field, and it has an implicit setter ("name") or getter ("name()"). Or you can have a method (also "name()"). For most purposes, we don’t care what it is, we just want to treat it as a setter or getter.

The TypeAdapter has five fields:

  • target – the fixture this adapter is "on" (set for a field or method, but not a type)
  • fixture – the fixture this adapter is "on" (always set)
  • field – the Field (only set for field references)
  • method – the Method (only set for method references)
  • type – the type, always set

I’m struck by the combinations of "this field set / that one not" – would a couple helper classes be an improvement?

Methods

The get() method tries to do a field access if the adapter is a field, or a method invocation if it is a method.

The set() method does a field set. (Could we extend this to call "setter methods" with a signature like setFoo(Foo value)?)

The invoke() method assumes that a method is set, and calls it with no parameters.

The parse() method asks the fixture to parse the string according to type. In C#, parse() is something each type (even primitives) define. I’m sure that simplifies some of this code.

So let’s say we have a ColumnFixture where phone() is of type PhoneNumber. How do we make that get parsed naturally? It looks like it works its way back to Fixture, which has a parse() method. So the ColumnFixture overrides it, and checks for an attempt to parse a class it knows about.

It seems like we could do some fanciness here, too, pushing an attempt to parse onto the domain classes. (So let Fixture.parse() take a look for "type.getMethod("parse")"; if we don’t want that we could subclass and override to avoid it.)

Primitives and Their Classes

The rest of this file is a whole bunch of subclasses of TypeAdapter: one for each primitive type, and one for each corresponding Class. Most of these are the same: the primitive type’s adapter is a subclass of the Class one. The primitive’s defines set(), and the class one defines parse()

The last one is the only exception: Arrays. There, the parse() method tokenizes it by looking for commas. Like so many other places in Fit, it trims spaces. Each element of the array is given its own TypeAdapter. The toString() method puts the commas in when printing it out.

Reflection

The big surprise here is the idea of unifying methods and fields. I’m not sure how I’d have come to the realization that they’re the same at a level we care about. (That insight is of a piece with the whole framework – I’ve understood reflection for years, but I’ve used it more for plugin-style work than anything like Fit that uses test data to drive the reflection.)

Fit Reading (Part 5 of 8) – ActionFixture

Wow – this one is a lot cleaner than I expected. I had tried overriding the C# version and had all kinds of grief. This version is straightforward and extensible.

Fields

The class has three fields:

  • cells – a Parse
  • actor – a Fixture
  • empty – an array of Class

Cells holds the list of cells for this row. It’s used by the action methods (such as enter()) to pull out data from the row.

Actor holds the object created by a start() action. Notice that it is static – that is what lets separate ActionFixture tables keep working with the same object without repeating start in every table.

Empty is the easiest – it’s just an empty list so that things that want a list of argument types can have one. I marked it final, since it’s never changed.

Methods: doCells()

The first method is doCells(). It saves off the cells, so other methods have access to this row’s Parse. Then it looks up the method in the first cell, and invokes it. (This method will be one of "start," "press," "enter," or "check.")

The fixture invokes the method on itself by "getClass().getMethod()" – looking for the method on itself. This is a place where the Java version is nicer than the C# one. The C# version hard-coded that line to the equivalent of "ActionFixture.class.getMethod()". That meant that a subclass of ActionFixture would only have access to the four methods ("start" etc.) originally planned. The Java version lets you extend this vocabulary easily.

Another thing to notice is that the fixture calls getMethod() on cells.text(), not camel(cells.text()). That’s a pity – my extended vocabulary has to be spelled exactly. (I don’t think the rules for camel casing are consistent throughout. I’m probably getting hung up from Fitnesse experience – I think it may have slightly different rules.)

Methods: Actions

Start() is straightforward. It creates an object of the named type, and stashes it in the actor field. I note that it doesn’t camel-case its argument, so "start MyFrameObject" is different from "start my frame object". (The latter won’t work.)

Enter() looks on the actor for a one-argument method it can use as a setter. It creates a TypeAdapter, which knows how to parse objects, passing it the cell text. Then it invokes the setter.

Press() invokes the named 0-argument method on the actor.

Check() assumes its 0-argument method is a getter, fetches the result, and passes it to Fixture’s check() routine, which does the comparison and cell coloring.

Methods: method()

The two variants of method() try to find an n-argument method on the actor The simple form camel-cases, so the fields on the start object can have the more user-friendly form. ("start MyFrameObject // press the rightmost button".) It double-checks that there is only one possible method. (So, if "firstName(int)" and "firstName(String)" both exist, it will report that it doesn’t know which to use.)

Next time, I’ll take up TypeAdapter.

Fit Reading (Part 4 of 8) – Fixture

Fixture: Fields and Two Helper Classes

There’s a Map summary that accumulates things like the "run date." I don’t know why the top-level Fixture has this, but it does. The fixture fit.Summary walks through this table and gives summary statistics.

There’s a field counts that has counts of tests passed, failed, and exceptions/errors. The Counts class is just a data bag for these things. When a fixture calls wrong(), for example, the count is incremented.

The last field is args, which has the arguments from the first row of the fixture. The method getArgs() returns a String[] and lets a fixture use them. I don’t think this is in the C# version yet but we definitely use that sort of thing there.

There’s an internal class RunTime. It takes a snapshot of the current time. Right now, the only use of this is to put it in the summary, under the key "run elapsed time". Presumably some fixtures pull the RunTime object back out, and use toString() to display the elapsed time. But nothing in the standard distribution appears to use it directly. (Fit.Summary will display the elapsed time when it dumps the summary table.)

Starting Fixtures

Now we come to doTables(), the top-level method. (It’s called by FileRunner, passing in a Parse for each table.) This method first looks at the name in the first cell of the first row of the table. Then it tries to create the fixture, then use it via interpretTables(). Along the way I note that this routine is using a couple null checks; I wonder if those are necessary? If the first table’s fixture fails to be created and run, it runs the remaining fixtures via interpretFollowingTables().

Method getLinkedFixtureWithArgs() tries to load the fixture named in header.text(), then it sets up the arguments (for getArgs().

The method loadFixture() takes the name of the fixture, and attempts to "new up" the named fixture via reflection. Between the last method and this, I’m worried by what I don’t see: what routine uses the camel method? That suggests a test: let’s load "fit.ActionFixture", "fit.Action Fixture", and "fit.Action fixture" and see what happens. From what I understood going in, all three should be ok. From what I’m seeing here, I don’t see what would make that work.

Why did I expect this? Because ColumnFixture does it for column names. It turns out that’s not a good enough reason. The test shows that only "fit.ActionFixture" loads.

Up to interpretTables() again. It does getArgsForTable() – again. There’s even a comment to that effect. I don’t see why it should be necessary, though. Actually – it’s all a little subtle, and I’d say the comment is misleading. The comment says, "// get them again for the new fixture object". But really, that’s what we did in getLinkedFixtureWithArgs(). Now we’re getting the arguments for the original fixture.

It works like this: when FileRunner starts, it runs doTables() on a new Fixture object. That’s the object that tries to pull fixtures from tables and run them. When the first table is seen, its arguments are pulled out and given to the corresponding fixture. But then they’re also copied back to the initial fixture as well. I imagine they’re actually rarely needed there.

At any rate, interpretTables() then calls doTable(), which does a straightforward job of working its way into doCell(). Finally, it calls interpretFollowingTables().

InterpretFollowingTables()

By the time we’re here, we run through a loop, looking up fixtures and then interpreting them with doTable(). For these, we don’t change the arguments on the fixture that started it all. Why not? I can only guess it has to do something with the way DoFixture wants to work – treating the first table special.

All this work seems a little off – it seems the Fixture class is paying for interpretation that a particular table wants. I’m a long way off from looking at DoFixture, but if that’s the table that should be first, it seems to me like it should pay for this complexity. I know I’m second-guessing here…

Check()

The other routines are either straightforward, or I’ve looked at them already. The exception is a largish routine at the bottom: check(). This is a helper method, used by some subclasses. It deals with blank cells, null adapters, "error" expected (to deal with expected exceptions), and text that should match. In each case, this method puts the output in the cell, colored appropriately.

Up next…

I think I want to look into ActionFixture next. I had an unhappy session trying to extend the C# version (which appears to be older). Then I want to dig into how TypeAdapters work.

Fit Reading (part 3 of 8) – Parse and Fixture

Parse

First I want to chase down a couple oddities in what I saw last time. It boils down to these two tests:

// This test shows offset isn't applied the way I expected
     public void testOffset() throws Exception {
          int offsetToData = 2;
          Parse p = new Parse(
                  "xx
data

", Parse.tags, 0, offsetToData); assertEquals("", p.leader);

}

and:

 

// This test shows cells with embedded tables "go away"
     public void testInnerTables() throws Exception {
          Parse p1 = new Parse(
 "
stuff plus

");

                         Parse p2 = new Parse( "

stuff" + "

inner

plus

");

                        Parse cell = p1.at(0,0,0);

                        assertFalse(cell.body.equals(""));

                        assertEquals("stuff plus", cell.text());

 

                        cell = p2.at(0,0,0);

                         assertFalse(cell.body.equals(""));

                         assertEquals("stuff inner plus", cell.text());

}

(I sent email to the maintainers; these may just be demonstrating my ignorance of how it's intended to work.)

 

FixtureTest

 

It won't take long to look at this: it only has one test!

    assertEquals("     ", Fixture.escape("     "));

The method basically handles converting plain text to have HTML entities.

But there's a little more going on in Fixture…

Fixture

I've only got a few minutes, so this is a quick overview. From the FileRunner, I can see that doTables() is an important entry point. Last time I looked in the Java version, it was simpler than it is now. There are comments showing code added for DoFixture and for fitnesse, and they've made it all a little trickier. This is all to make an improvement at the user level.

The core of this is: doTables() calls getLinkedFixtureWithArgs() and interpretTables(), which (eventually) calls doTable(), which calls doRows(), which calls doRow() once for each row, which calls doCells(), which calls doCell() once for each cell. In Fixture, doCell() calls ignore(), which marks the cell gray. ColumnFixture(), for example, overrides doCell() to do something interesting (like look for expected results).

I'm going to skip over how the first table gets loaded and interpreted – it looks interesting (i.e., tricky:) Instead, I'll peek down to a section labeled "Annotation". This area contains methods that can mark and color cells: right(), wrong(), info(), ignore(), error(), and exception(). I've seen these called in several of the standard fixtures before.

I see where exception() puts the stack trace into the cell. That gets SO ugly when something goes wrong. (It makes the cell huge, full of scary content useful only to programmers.) Maybe someday I'll take a whack at a more readable version.

Utilities

The final section is Utilities. The lightly tested escape() method is there. There's also a method to put words into camel case. I'll add a test:

assertEquals("twoWords", Fixture.camel("two words")); assertEquals("MiXedCAsE", Fixture.camel("MiXed cAsE")); assertEquals("aFewWordsTogether", Fixture.camel("aFewWordsTogether")); assertEquals(     "acronymsLikeHTMLStillUppercase",      Fixture.camel("acronyms like HTML still uppercase"));

Hmm. This is perhaps not the pattern I'd have chosen. I have the impression some of the other languages do it differently.

There's a parse() method that appears to handle only Strings, Dates, and ScientificDoubles. I know that C# works a little differently, since parse() is more integrated.

There's a check() method that looks too complicated to understand in 30 seconds. It uses a TypeAdapter, which is another class I want to look at soon.

Finally, there's a method to get the arguments from a Fixture. This is new – it used to be that the first row had the fixture only. (Rather, you had to parse any arguments out yourself.) Now that's built in, accessed via getArgs(), which returns a String array.

That's it for today. Next time, I want to dig into how fixtures get started (since this has changed some), and into the check() method.

Fit Reading (Part 2 of 8)

Inside Parse

I spent last time on tests only – this time I want to go inside the Parse class.

The top of the class reveals strings for leader, tag, body, end, and trailer, as expected. There are also parts and more, which are Parses. A skim through the class, looking for big routines, shows that the constructor, findMatchingEndTag(), removeNonBreakTags(), print(), and footnote() methods seem to be the biggest and most complex.

Footnote? What’s that? The tests didn’t mention it! Looks odd – it’s not referenced inside fit anywhere; rather, it’s used by some clients, typically after a call to wrong(). It appears to create a file Reports/footnote/n.html, and prints the parse to it.

My strategy today is to chew off the routines that are small and/or simple, then go back and figure out the big routines. I have two things I’m trying to understand: "What happens with nested tables?" and "How do I insert stuff into the middle of a Parse?" (I need the latter for fixtures that want to report a little more nicely.) I guess I have a third question too – "how are spaces handled?" This arises because I saw a note on the mailing list that says there are differences in the various fit implementations.

Small Fry

There are some small and simple recursive routines: size(), last(), leaf(), and at(). There are a bunch of little routines for escaping characters and dealing with HTML; I’ll come back to those.

There’s a little helper routine addToBody() that just appends text to the body. That doesn’t sound like much – and it’s a one line routine, basically "body = body + text", but a search for usages shows that this is what fixtures use to get their info into the output. (If a fixture wants to show a cell’s expected value, it uses this method to append some HTML text to the cell’s Parse.) That answers one of my questions. I’ll have to play with it to learn it better.

The print() routine is longer than these one-line methods, but looks straightforward. It writes the Parse out: leader, tag, then either body or parts, the end tag, and either the more or the trailer. I knew body and parts were mutually exclusive; I hadn’t realized that more and trailer are exclusive as well. I wonder if body and trailer appear together, and parts and more appear together? If so, I wonder about splitting Parse up so subclasses can deal with that difference. It’s not a huge class; may not be worth it.

Constructors

That leads me up to the first constructor – Parse(String tag, String body, Parse parts, Parse more). Note that it has parameters for both body and parts. So much for my theory of a paragraph ago. But it’s close – I did a search and found 15 places that called this constructor. All but three used either body or parts exclusively.

One of the ones that didn’t is in fat.Table It is using this constructor to copy an existing Parse. That looks misplaced – if we need to copy these, then we can put a method on Parse to do so. A second place is fat.FixtureNameFixture. The GenerateRowParses() method passes in a string for "body" and a Parse for "more". (So we have an example where "parts" and "more" don’t go together.) I can’t tell why it does this on a quick look. The final place is eg.AllFiles.td(), which also uses "body" and "more" together.

The first constructor passes in all the pieces separately. Then there are a few constructors that default tags and so on, to the main constructor that actually parses some HTML. That fixture looks for several key positions in the input: the start of the target tag, the end of the target tag, the start and end of the corresponding end tag, and the start of the rest of the text.

I see that the first search starts at the beginning of the string, rather than at "offset". That seems odd.

We’ll have to double-check how findMatchingEndTag() works, but the rest of the constructor looks straightforward: if there are more tag levels, turn the body into a new Parse (and set body to null). If there’s a nested table, parse the table and set the body to "". (That seems odd also, like it’s throwing away any non-table stuff. I’m not sure what the "" body accomplishes either.) Finally, if there are more tags at the current level, null out the trailer and parse the remaining tags into "more".

FindMatchingEndTag() looks like an implementation of the parenthesis-balancing rule – add 1 every time you see a left parenthesis, subtract 1 every time you see a right parenthesis. If you’re balanced, you’ll have a net of 0.

So I have an answer about nested tables: it’s trying to handle them. I’m seeing a little weirdness that makes it look like a nested table is the only thing retained inside a cell. But at least I know it’s trying. I’ll make some tests to fill in what I’m seeing. I only have a few minutes left, so I want to move on to the htmlToText() part of the code.

Html to Text

The htmlToText() routine has four steps: normalize line breaks, remove non-break tags, condense white space, and unescape. Normalizing line breaks turns <br> and strings of <p> tags into <br />

Removing non-break tags is a little tricky-looking, but it basically squeezes out tags other than the normalized break tags we just produced. The method "looks forward" to see an end-of-tag; if it’s there, it trims out the tag and looks at the rest of the string.

Condensing white space applies the rule: convert multiple blanks to a single blank, convert a "160" to a space, and convert   to a space. I assume 160 is the code for a non-breaking space in Word’s font.

Unescaping is simple too: br tags are converted to newlines, standard entities such as <lt; are converted to their simple character, and smart quotes are converted to " or ‘ as appropriate.

The result of all this is that text() produces the Parse in straight text form – no tags. This is what fixtures will want when they compare expected values.

Summary

I had three questions:

  • What happens with nested tables? They are apparently handled, although it looks like only the nested table is retained, not anything surrounding it.
  • How do I put stuff inside a Parse? Use the addToBody() method.
  • What happens with spaces? Multiple spaces get converted into one, and non-breaking spaces get converted into one space each.

I’m left with a little bit of question in my mind about why the Parse constructor doesn’t use the offset when it’s looking for the first tag, and about the details of nested tables. But that’s ok; I learned a lot today.

Fit Reading (part 1 of 8)

"Fit" is Ward Cunningham’s "Framework for Integrated Tests". You can pick up a copy by starting from fit.c2.com.

I’ve used it for a while, and looked at some of its code along the way, but hadn’t sat down and really studied it systematically. My plan is to spend an hour at a time, just digging in to what I find and sharing my notes.

 

1. Download and Compile

I download a copy and pull it in to my IDE. I see there are some yellow flags (syntax warnings). One of them looks like a style difference:
if (x > y) return x+y; else return x*y;
This is legal but it complains about the fact that there’s an "else" clause following a "then" with a return. I’ll just change things along the way even though I don’t plan to feed this back to Ward.

I also put in an AllTests suite. Method fit.FrameworkTest.testRuns() fails, showing that I haven’t moved the examples directory to the right place. This has happened at least partly because I moved stuff around when I put it in Eclipse. I copy the examples folder in, and create an output/test folder. The test tries to run, but fails the actual test (not just from a bad folder name). But time is ticking, so I’m moving on.

I take a quick look around at the files that are there – a lot of them are fixtures, which I’ve at least subclassed before.

2. Parse

FileRunner and WikiRunner are the two executables I see. WikiRunner says it’s deprecated, so I delete it. FileRunner does three things: read its arguments, create a parse of the document, and run the fixture on the parse.

I know already that Parse is some sort of tree-structure representation of the document (like DOM but simpler). I’ll start by looking at ParseTest. The first test shows that it divides a string into 4 pieces: a leader, a tag, the body (inside the tag), and a trailer. The test just used "body"; I wonder what happens if you nest tags, so I try that (with a new test). Turns out it just accepts the new tags as part of the body. (That suggests how tags fit doesn’t care about are handled – they’re just left in the body.)

The next test shows what happens when tr and td tags are thrown into the mix: parsing is different. Parse has a parameter that tells which tags it’s interested in. It looks like we have a choice: when there are no interior tags, the information is put in "body"; when interior tags exist, they form a new level, inside "parts". "Parts" is another parse, so the whole structure is recursive.

The next tests show that we can navigate to "sibling" nodes using "more". To move deeper into the parse, we use "parts"; to move to a sibling node at a given level, we use "more". I saw mentioned somewhere that Parse has a Lisp-like structure, and this is the place that makes that happen. (Lisp uses "car" to get to the head of a list, "cadr" to get to the second element, "caddr" to get to the third, etc. In the tests, we see code like p.parts.parts.more.body to get to the second cell of a table. My Lisp is rusty, but this gives me something to tie it to.)

The Parse has an abbreviation, "at(i,j,k)". I really think of this as "at(table-index, tr-index, td-index)". The test for this shows that if an index is too big, you get the last element in the list (at that level). Also, there are helper methods to get the first leaf and the last node in a list.

The next test shows that the parser throws an exception if a tag isn’t found where it’s expected. (So a table that is missing its td tag will complain.)

The Parse has a text() method that prints the human-readable form of an HTML text. It handles & references, blanks, etc.

Finally, there are a pair of tests for helper methods that deal with escape characters and white space.

3. Where are we?

My hour is over. Admittedly, some amount of it went into writing these notes. I’ve added a few tests that clarified some of what is going on. I have a little better understanding of the Parse structure (body versus parts, for example). I still don’t know what it does in the face of nested tables. (This used to be a problem; I think they may have fixed it, but there’s nothing in the tests to say one way or the other.)

I had hoped to get into the code for Parse itself, and that’s my plan for next time.

Procedural and Declarative Tests

Procedural tests focus on a series of steps; declarative tests work from the input and states. Procedural tests are natural to write, but consider whether a declarative test expresses your intent more concisely and clearly.

Test Styles

There are two common styles of writing tests. One style might be called procedural: a test consists of a series of steps, each acting on or testing the state of the system. The second style might be called declarative: given a state of the system and some inputs, test the resulting state and the outputs. (These terms are from the theory of programming languages.) If we were using Ward Cunningham's fit testing framework, we might think of these as prototypically ActionFixtures vs. ColumnFixtures.

Let's look at a small, concrete example: a simple login screen.

We'd like to test that the OK button is highlighted only if both a username and a password have been specified.

Here's a test in the procedural style:

1 enter username bob
2 expect OK inactive
3 enter password pw
4 expect OK active
5 clear username  
6 expect OK inactive
7 clear password  
8 expect OK inactive

Here's a test of the same capability in a declarative style:

  Username Password OK active?
1 none none no
2 bob none no
3 none pw no
4 bob pw yes

There's a sense in which the first test is more natural to write: it tells you what to do, step by step. But consider some advantages of the second style:

  • If we want to know whether a particular test case is covered, the second style shows it more directly. For example, what if the username and password are both empty? It's easy to see that this case has been considered in the second test.
  • The declarative style fails or passes one row at a time. The procedural style is vulnerable, in that if a check in the middle fails, the whole rest of the test may be invalid. (This isn't "free" though – the code connecting the test to the system must be designed to make this test independence work.)
  • The declarative style has all the critical state, inputs, and outputs listed explicitly. In the procedural style, you may have to trace the state through many steps.
  • The procedural style tends to presume it knows details about the interface; this makes it more brittle to change.

Keeping State

Real systems may involve hidden state: state that affects the system but is not directly set or seen. Consider this example:

This is an accumulator: it provides a running sum and average. Obviously, it must somehow keep track of n, the number of items entered, in order to compute the average.

We can make a declarative test out of this by exposing the hidden state to the test:

n sum data sum' avg' comment
1 100 100 200 100 normal
9 100 0 100 10 add 0
2 2 0 2 0 integer divide

The first two columns, n and sum, represent existing state. Data is the number entered on the screen, and sum' and avg' are the results.

Getting to the hidden state can be a trick. Sometimes the programmers can explicitly expose it as part of a testing interface; other times the state can be accessed by a sequence of steps. In this case, we could setup the proper state for each row by doing this:

Hit the clear button
Repeat n-1 times: add 0
Add sum
[Optional: check sum]

Then the rest of the row might be interpreted as:

Add data
Check sum
Check avg

This puts some burden on the test setup code. But if the setup is not done there, it's probably repeated in a bunch of scripts.

When to Use the Procedural Style?

Procedural style pays off best when the sequential nature of the feature is what's interesting: when how you got somewhere is as important as where you are. Consider the problem of multiple selection, using click, shift, control, and drag. For setup, imagine a vertical list of items, numbered 1 through 10. Consider this test:

1 click 5  
2 release    
3 expect selected 5
4 expect last 5

Or this one:

1 click 5  
2 drag to 7  
3 drag to 3  
4 release    
5 control    
6 click 7  
7 expect selected 3,4,5,7
8 expect last 7

You could perhaps develop a set of declarative tests for these, but the sequence of actions is what's interesting.

What to do?

Leverage procedural tests for the cases where the sequence of actions is paramount, and there's not a lot of commonality between scripts. Scenario tests can benefit from the procedural style.

A declarative test is like a table you could put in a user manual: it concisely and concretely explains a function. Declarative tests sometimes require extra setup, especially when hidden state is involved, but they're often worth that extra trouble. Declarative tests are good for testing permutations of a business rule.

If all else is equal, favor the declarative style.

These guidelines can boost your test-writing efficiency: they move repetitive actions into the test setup, and let you focus on the interesting part of a test.

[Written February, 2005. Brian Marick suggested the terms, though I'm not sure I picked the set he liked best.]

Tools – Especially JUnit and Fit

I'm reflecting on the most important tools I've been using this past year for my Java projects.

 

  • IntelliJ Idea – A fine IDE. My current default.
  • Eclipse – I've used it some, and found it a little clunkier than IntelliJ's. But I plan to move toward it more this coming year.
  • P4 – Perforce source control system. It's free for a single user, and does a nice job.

I've used two primary testing tools:

  • JUnit – for unit testing.
  • Fit – for system/acceptance testing.

JUnit is a simple unit testing framework. Suppose we're testing the Stack object. Here's an example test. The framework will locate the method via reflection (it treats methods starting "test" as tests):

import junit.framework.*;

public void TestClass extends TestCase
{
   public void testSomething() {
     Stack stack = new Stack();
     stack.push("Test string");

     String result = stack.pop().toString();

     assertEquals("Expected last value pushed", "Test string", result);
   }
}

(This test case fits a pattern I call "Arrange, Act, Assert": it sets up data, calls a method under test, and then verifies that it worked as expected.)

JUnit lets you create setUp() and tearDown() methods that will be called before and after each test method in a file. It has a number of other assertion methods (assertTrue, assertNull, etc.)

Fit is a tool for higher-level testing, available from fit.c2.com. It allows a Customer or tester to write tests in a spreadsheet (a very familiar interface). The tests are exported to HTML, and fit reads and runs them.

I typically use fit with a program known as FitNesse. FitNesse is a standalone wiki that knows how to run fit tests, either individually or in suites.

I can't create tables in this blog, so you'll have to use some imagination. But a fit test might look like this:

myprog.fit.NameFixture
in           formatted()
Joe Fish        Fish, Joe
John H. Doe      Doe, John H.
Queenie         Queenie
Dodge, B. R.      Dodge, B. R.

Fit has a number of builtin test classes that you can extend.

Both these testing tools can help change how you look at tests. They're both good additions to your set of tools

Happy new year to all! –Bill

Fit Spreadsheet – Output

Starting Fit

To use fit, you create a web page that has tables in it; the tables specify tests. (There are other options but that is easiest.) In this case, I'm using Microsoft Word™ and saving the file in HTML format.

 

The fit FileRunner acts as a filter: given a web page, it copies text outside of tables as is, and runs your program on the table entries. Some table entries represent tests that can pass or fail; fit colors them green or red respectively. The output is another HTML file.

 

Fit will also put a summary in the file if you put in a table like this:

 

fit.Summary

counts 0 right, 0 wrong, 0 ignored, 0 exceptions
input file C:\P4\FirstFit\fit\FirstFit-in.htm
input update Thu May 01 10:51:42 EDT 2003
output file C:\P4\FirstFit\fit\FirstFit-out.htm
run date Thu May 01 10:58:28 EDT 2003
run elapsed time 0:00.05

 

With this tool, you don’t manipulate screen elements directly. Instead, you work with an abstraction of them. To me, it feels like talking to somebody over the phone, trying to tell them how to use an application. (“In cell cee seventeen, put equals a one; then go to a one and type ‘fish’.”)

 

This article shows the input to [output from] fit; the result of running it is here [input is here].

Programming and Configuration Notes                                                                

Fit is a tool for customers and testers, but programmers will use it as well, and will have to write some of the fixtures the team uses. In this paper, I’ve tried to use the framework mostly straight out of the box.

 

The CLASSPATH needs to include fit.jar (both in the DOS window and the IDE). The runner command I’m using is:

java fit.FileRunner FirstFit-in.htm FirstFit-out.htm

 

When I do this on the file I have so far, it creates the output file and writes this to the console:

0 right, 0 wrong, 0 ignored, 0 exceptions

Fixtures

Tables in the input file have the name of a fixture in the first row. A fixture is a class that knows how to process the table. Fit comes with several fixtures built in, and programmers can create others.

 

One simple fixture is the ColumnFixture. In this fixture, the first row is the fixture name, and the second row has the names of data. If a name ends without parentheses, it is regarded as a field to fill in; with parentheses, it’s treated as a method (function) call. The fixture fills in all the data fields, and then calls the methods to verify that they return the expected results.

 

Another standard fixture is the ActionFixture. This one consists of a series of commands. These include:

  • start classname: Creates an object of the specified class
  • enter field value: Sets the field to the value
  • press button-name: Calls the method corresponding to the button
  • check method value: Checks that the method returns the expected value

 

The ActionFixture ignores anything past the first three columns; we’ll use the fourth column for comments.

 

So, we’re finally ready to start our application.

 

fit.ActionFixture

start

Spreadsheet

Create a new spreadsheet.

 

This test doesn’t ask for much, but of course it fails. (There isn’t any code yet!)

            0 right, 0 wrong, 0 ignored, 1 exceptions

 

Programmer Notes

The exception is thrown because the Spreadsheet object doesn’t exist. To create it as simply as possible, make it extend Fixture:

 

import fit.Fixture;

public class Spreadsheet extends Fixture {}

 

This gets us back to

            0 right, 0 wrong, 0 ignored, 0 exceptions

 

I’ve put together stubs for the fixtures used in this article: Spreadsheet.java, SpreadsheetFormula.java, and Address.java; here’s a zip file containing all three.

A Few Stories

We have several things we want our spreadsheet to do:

  • Track the contents of cells
  • Distinguish data from formulas
  • Provide both data and formula views of cells
  • Support “+” for appending strings, “’” for reversing strings, “()” for grouping, and “>” for string containment.

Cells

The spreadsheet has a number of cells, each of which has an address. Cells contain string data or formulas.

 

We’ll assume several screen elements:

  • address – the address we’re working with; something like “B19”
  • cell – the cell contents we enter (to the last “address”)
  • formula – the cell contents as entered (for the last “address”)
  • display – the cell contents as seen when the formulas are applied (for the last “address”)

 

We’ll start with a simple data cell.

fit.ActionFixture

Comments

start

Spreadsheet

 

 

enter

a1

abc

 

check

a1

abc

Text in cell

check

formula

abc

Formula is same. (Looks in last-mentioned cell.)

 

Now let’s add in a formula cell. (Note that this table omits the “start” line; this means it’s working on the same object as before. This lets us not repeat the setup, but it also makes the tests less independent.)

 

fit.ActionFixture

Comments

enter

a1

abc

 

enter

b1

=A1

Simple copying formula

check

formula

=A1

Formula is there

check

a1

abc

Original text in A1

check

b1

abc

Text was copied to B1

 

The essence of a spreadsheet is the automatic updates. Let’s change A1 and see it happen.

 

fit.ActionFixture

Comments

enter

a1

abc

 

enter

b1

=A1

Simple copying formula

check

b1

abc

Copied value

enter

a1

revised

Update A1

check

b1

revised

Automatically updates B1

 

 

We already have quite a few elements in use, though we haven’t specified exactly what is valid. Let’s just note the “specification debt” and move on.

  • What can a cell hold? Empty string, other string, formula starts with “=”
  • What’s a valid address? Letter plus digits; ignore leading 0s; case-insensitive.
  • What’s a valid formula? So far, we’ve just used a simple cell reference, but we want operators too.
  • What happens when a cell has an invalid formula?
  • What happens when a cell refers to a cell containing a formula?
  • What happens when formulas form a loop?

 

We’ll pursue all these, but let’s start with formulas.

Formulas

Formulas can reference formulas.

 

SpreadsheetFormula

a1

b1

c1

d1

a1()

b1()

c1()

d1()

data

=A1

=B1

=C1

data

data

data

data

 

  

Formulas get more interesting when there are operators available. The reverse operator (‘) is probably a good one to start with.

SpreadsheetFormula

a1

b1

b1()

abc

=A1'

cba

abc

=A1''''

abc

 

 

 

The most useful string operator is probably append (+):

SpreadsheetFormula

a1

b1

c1

b1()

c1()

abc

=A1+A1

blank

abcabc

 

blank

abc

def

=A1+B1+B1+A1

def

abcdefdefabc

 

 

We have enough features that we can demonstrate an identity: (XY)’=Y’X’. We don’t have parentheses yet, but we can simulate this by putting the parts in separate cells.

 

SpreadsheetFormula

a1

b1

c1

d1

e1

d1()

e1()

abc

xyz

ignored

=A1+B1

=D1'

abcxyz

zyxcba

abc

xyz

=B1'

=A1'

=C1+D1

cba

zyxcba

 

  

Parentheses can be used to group operators. Let’s re-do the previous test, allowing parentheses:

SpreadsheetFormula

a1

b1

c1

c1()

abc

xyz

=(A1+B1)'

zyxcba

abc

xyz

=B1'+A1'

zyxcba

 

 The operator “>” tells whether one string contains another one. If the first string contains the second, the result is the second. If the first string doesn’t contain the second, the result is an empty string.

 

SpreadsheetFormula

a1

b1

c1

c1()

banana

ana

=A1>B1

 ana

banana

bab

=A1>B1

 

blank

 

We haven’t talked about precedence yet. The ‘ and () operators have the highest precedence, then +, then >. A1+B1+C1 is a legal expression, but A1>B1>C1 is not.

 

SpreadsheetFormula

a1

b1

c1

c1()

abc

xyz

=A1+B1'

abczyx

abc

xyz

=(A1+B1)'

zyxcba

 

 

SpreadsheetFormula

a1

b1

c1

d1

e1

e1()

abcdef

ghijkl

e

hgf

=A1+B1>C1+D1'

efgh

 

Backfill

We have several questions left open:

  • What can a cell hold? Empty string, other string, formula starts with “=”
  • What’s a valid address? Letter plus digits; ignore leading 0s; case-insensitive.
  • What happens when a cell has an invalid formula?
  • What happens when formulas form a loop?

 

The previous tests made a quick pass through the system. I think of them as generative: they help define the essence of the system. But questions like the above require us to fill in the gaps. I think of tests that do things like check “corner cases,” error cases, and how features interact as elaborative; they fill in what we already have. They might find problems, but they may well work already, depending on how the system was built.

What a cell holds

We already have test cases where a cell holds a string, and where a cell holds a formula, but it would be prudent to check that the operators work correctly on empty strings. If e is the empty string and x is a non-empty string, we expect:

            e’ = e

            e+e=e

            e+x=x

            x+e=x

            e>e=e

            e>x=e

            x>e=e

 

As I go to write the test, I realize that we never specified what a cell starts with. The answer, of course, is the empty string. So we’ll rely on that: A1 will be empty.

 

fit.ActionFixture

Comments

start

Spreadsheet

 

 

check

a1

 

Verify that cell starts empty.

 

Then we can verify those rules about working with the empty string:

 

SpreadsheetFormula

a1

b1

c1

c1()

Comment

blank

blank

=A1'

blank

e’=e

blank

blank

=A1+A1

blank

e+e=e

blank

blank

=A1>A1

blank

e>e=e

blank

abc

=A1+B1

abc

e+x=x

blank

abc

=B1+A1

abc

x+e=x

blank

abc

=A1>B1

blank

e>x=e

blank

abc

=B1>A1

blank

x>e=e

 

Valid Addresses

There are two places we use addresses: in the address field and in the cells with formulas. When we get a “real” (graphical) interface, the address will mostly be implicit. But even so, we’ll test it here just to be safe.

 

Let’s use a ColumnFixture for this: we’ll put address in one column, valid() in another, and standardized() in another. (A programmer will have to write the new fixture for us.)

 

The rules are: a valid address is a letter (A-Z, a-z) followed by one or more digits (0-9). Case is ignored. Leading 0s are ignored. “0” is not a valid row number.

 

Address

address

valid()

standardized()

A1

true

A1

a1

true

A1

A9874

true

A9874

Z1

true

Z1

z1

true

Z1

Z3992

true

Z3992

z3992

true

Z3992

AA393

false

 

zX202

false

 

é17

false

 

1

false

 

~1

false

 

~D1

false

 

y&1

false

 

^

false

 

X392%

false

 

H001

true

H1

j00010

true

J10

e000

false

 

A0

false

 

z0

false

 

 

Let’s make sure that case-insensitivity works in formulas:

 

SpreadsheetFormula

a1

b1

b1()

abc

=A1+a1

abcabc

 

Formula Errors

If a formula contains an error, we’d like it to display as “#error.” We’ll put all the invalid names from the previous table into formulas, and verify that formulas behave correctly. Then we’ll try various improper combinations of operators.

fit.ActionFixture

start

Spreadsheet

Create a new spreadsheet.

enter

a1

=AA393

Bad address

check

a1

#error

Marked as error

check

formula

=AA393

Formula as written

enter

a1

=A2

Change to valid address

check

a1

 

Make sure #error is cleared 

 

 

SpreadsheetFormula

a1

a1()

Comment

=zX202

#error

Two letters

=é17

#error

expected


? actual

Non-ASCII

=1

#error

No letters

=~1

#error

No letters

=~D1

#error

Unacceptable character

=y&1

#error

Extra character

=^

#error

No letters/digits

=e000

#error

expected


? actual

Too many digits

=A0

#error

expected


? actual

Invalid row #

=z0

#error

expected


? actual

Invalid row #

=

#error

Missing formula

  

Then we’ll get to some operators:

SpreadsheetFormula

a1

a1()

Comment

='A2

#error

'  should be postfix

='A2'

#error

Can’t be before and after

=A2+

#error

Need other term

=A3+A4+

#error

Need other term

=A2++A3

#error

Missing term

=A2+'+A3

#error

‘ isn’t a term

=A2'''+A3

blank

OK to mix things

=A2)

#error

Missing (

=(A2

#error

Missing (

=((((((((((((A2))))))))))))

blank

OK – big expression

=((((((A2+(A3))))+A4)

#error

Unbalanced – too few )

=(((A2>A3

#error

Unbalanced – too few )

=(A2>A3)))

#error

Unbalanced – too many )

=A2>A3>

#error

Can’t trail >

=A2>A3>A4

#error

Can’t repeat >

 

Loops

If a formula uses itself (directly or indirectly), we don’t want it to loop forever trying to figure it out. Instead, we’d like the display to be “#loop.”

 

SpreadsheetFormula

a1

b1

c1

d1

e1

a1()

e1()

=A1

blank

blank

blank

blank

#loop

blank

=B1

=C1

=F1+D1

=E1

no-loop

no-loop

no-loop

=B1

=C1

=F1+D1

=E1

=A1

#loop

#loop

 

Conclusions

This paper has demonstrated a set of tests using the fit acceptance testing framework. Some things to note:

  • The tests here have been written as if a customer specified them, without much demonstration of the programming cycle. But programmers can work with these tests in much the way they would with JUnit.
  • The tests are written without benefit of the feedback of a working system. (I wrote just enough code to make the first test not throw an exception.)
  • The tests look at only part of the system: the core functionality. There are other aspects of a real application that we aren’t testing. (For example, it may be non-trivial to connect a screen to the core code.)
  • Even a small application such as this requires a fairly large set of tests. With more programming work on the fixtures, we might be able to reduce some of the noise. Real applications will organize tests into multiple files, and will have to pay more attention to the challenges of consistency, test independence, and feature interaction.
  • It feels smooth to mix light natural-language specification with formal, executable tests.
  • Fit has a number of features we haven’t used.

 

I’ve heard that many teams use xUnit for unit testing, but still struggle to get customer tests before or even after stories are implemented. I hope frameworks such as fit can help lower the barriers to doing this crucial task. 

 

fit.Summary

counts 94 right, 4 wrong, 0 ignored, 0 exceptions
input file C:\P4\FirstFit\fit\FirstFit-in.htm
input update Thu May 01 10:51:42 EDT 2003
output file C:\P4\FirstFit\fit\FirstFit-out.htm
run date Thu May 01 10:58:28 EDT 2003
run elapsed time 0:00.14

 

Resources and Related Articles

 

[Written April 20, 2003; revised April 26, 2003, to correct mis-stated identity & in response to Ward Cunningham's great suggestions about improving the fixtures; 2012 – the WordPress version is designed to simulate the original look.]