State Machines, part 4: Testing

State machines are useful, but how do we know they’re correct? We’ll look at several aspects of testing state machines: the machines themselves, and the interpreter and direct code implementations.

The “State Machines” Series

Levels of Testing

When using state machines, consider multiple levels:

The state machine itself: If this is wrong, it doesn’t matter how perfect the implementation is.
For interpreters:
- The encoding of the state machine: It’s easy to make transcription errors!
- The interpreter and its data structures: Make sure it works correctly for any state machine.
For direct code:
- Manually written code: Manual code spreads the state machine across all the code, so you have to test every nook and cranny.
- Generated code: You need to be confident both in the generator and the code it generates.

Testing the state machine itself is the most critical part: a perfect implementation of a wrong machine won’t work. You can test a state machine before a single line of code exists.

You can test a state machine before a single line of code exists.

Testing by Systematic Review

States

Systematically walk through each state, and ask:

Why does this state exist?
Have you defined transitions for each possible event?
If you default to error transitions, do you understand the expected behavior?
Are there any missing or unused states?

Make sure you’re clear on error handling in particular. Confusion about this is very common. You need to understand what your system will do when an error is detected. Does it stop handling events? Will it restart? Does it need to do some sort of synchronization to get to the start state?

Are there real-life consequences of hitting an error state? Do errors need to be logged or handled in other special ways?

Events

Examine events:

Do you have a complete list of events?
Does your list have events that can’t actually happen?
Do your events have any additional information?
If so, is there any ambiguity in those values? (For example, implicit units, unspecified formats, etc.)

The state machine is a way to describe behavior. However, it is an abstraction, and the real world is more complex.

Events in real life can be much more complex than the simple abstract model of “a stream of well-defined events”.

Is there any chance of the same event appearing multiple times?
Can events be lost or delayed?
Are events queued or buffered? What if the buffer is exceeded? How does the system deal with events arriving more quickly than they can be handled?
What happens if the stream just stops producing events?

Paths

One of the best ways to check state machines is to walk through how they behave with streams of events:

Generate possible event sequences
Simulate each sequence going through the machine
Check that it behaves and ends up as you expect

As you generate these sequences, consider:

Common sequences
Critical sequences
Cases where you’ve seen problems before
Corner cases that test obscure situations
“Loop” cases where the path goes through some states more than once
Error situations

If your state machine has loops, you can’t be exhaustive. Your context will help you decide how much to test: if you’re working on a pacemaker, or the next generation of Java locks, you need to be more careful!

Running Backwards: Generate Event Sequences

It may not be obvious, but you can run a state machine “backwards”. Instead of accepting event sequences, generate event streams.

To do this, start in the initial state and move from state to state, picking which edge to use. (You can pick randomly, or with a goal in mind.) As you traverse, make a list of the corresponding events.

There are several uses for these:

Check whether the streams look reasonable for your system.
Check that you were in the states you expected as you were generating.
Save the lists, and use them for test cases in your final implementation.

Generators have long helped validate compilers for programming languages: generate valid and invalid programs, and make sure the compiler works properly for them. (Full implementations need more than a state machine, but it is a helpful component.)

Generators have also helped to test phone systems. First, assign weights to the transitions to make a probabilistic model of real-world event streams. Then test the system with the generated events, and assess its performance, stability, and so on.

Testing State Machines Extensions

Are you using extensions to state machines?

actions on state entry
actions on state transition
conditions restricting when states change
outputs beyond just the final state

Since extensions are even less standardized than state machines, make sure you’re clear on how you expect them to work.

We’re used to code that’s in one place. Sure, it loops and makes method calls, but it’s static and unchanging.

State machines are different. When you use an extension that attaches code to states or transitions, verification is tricky because the the code fragments are spread all over. It’s the original spaghetti code.

The core problem is that instead of one program, each stream gets a unique program, the accumulation of code fragments for each state or edge it traverses.

For state machines extended with code, each possible stream represents a unique program, the accumulation of code fragments for each state or edge it traverses.

The testing problem is no longer “Does this code work?” but rather “For each possible stream, does its cumulative code work?”

Interpreter State Machines

The Data Structure: Test that the data structure (e.g., table or graph) performs as intended. This is easy for a simple table, and only slightly harder for a graph. If you compress tables, make sure they behave properly. The data structure is suitable for Test-Driven Development (TDD).
The Encoding: Your state machine needs to be installed in the data structure. Make sure you haven’t missed any states, or entered any states or transitions incorrectly. If you convert the machine manually (as is common), desk-check it for errors, and look for errors by testing overall behavior.
The Interpreter: While the interpreter loop is fairly small, it grows in complexity as you address more and more concerns. The interpreter can be built using TDD.
The Overall Behavior: Remember all those event sequences you identified earlier? Many of them are good tests to make sure the overall system is behaving as expected.

Directly Coded State Machines

goto, switch: These mapping approaches are similar in spirit to the interpreter-based machines. Instead of encoding to a table or graph, they systematically encode to the text of a program.

Review the encoding to make sure it maps to the intended state machine.

The implementation tends not to be incremental, just transliterated en masse from the original machine. You can split development a bit by getting the state machine right before adding any extensions.

Structured programming: You can develop structured code incrementally with an approach such as TDD. Major paths through the program usually represent distinct parts of the state machine. For example, compilers sometimes define a “bootstrap subset” that’s just enough of a new language to write its own compiler. This subset typically recognizes a much smaller set of possibilities. You can do the same: identify the critical parts for focused testing.

State pattern: You can develop and test incrementally, either splitting on the states or the events. As with other approaches, you can develop the state machine part before tackling any extensions.

For all approaches: The bare minimum is to test the state-to-state transitions, but it’s common to also use system-level tests, particularly when using extensions.

You face the common challenge for all unit testing vs system testing:

Does it do what you told it to (step by step)?
Does it do what it’s supposed to overall?

When you extend state machines with other code, it’s hard to asses them without using realistic streams.

State Machine Compilers

The interpreter, switch, and goto approaches systematically encode a state machine. It’s natural to think, “I know, I’ll write a tool to do that.” (You might also profitably think, “I wonder if there’s already a tool for that?”:)

That can work, and can give you leverage to quickly develop multiple state machines. But this approach introduces another layer of indirection, since you’re writing code to generate code.

For testing, you must consider each layer:

Do the parts work?
Does the whole thing work to generate the code you expect?
Does that generated code in fact work?

Conclusion

State machines can be checked on their own (before code is written): you’re checking a specification. Check the code to implement a state machine as well.

When you’re testing state machines with extensions, you will need to test with event streams (as well as any incremental testing you do during development). A series of event streams is generally the only way to test all the code combinations in an extended machine.

In summary, test the state machine, and test your code.

Additional Reading

“How to Unit Test Finite State Machines,” by Urs Enzler. https://www.planetgeek.ch/2011/05/17/how-to-unit-test-finite-state-machines/. Retrieved 2020-11-03.

“Principles and Methods of Testing Finite-State Machines – A Survey,” by David Lee and Mihalis Yannakakis. https://www.researchgate.net/publication/2985061_Principles_and_Methods_of_Testing_Finite_State_Machines-a_Survey/link/5745c2a608aea45ee8560b31/download. Retrieved 2020-11-03. Related presentation by Kuo-Wen Lo.

“Testing, Optimization, and Games,” by Mihalis Yannakakis. http://www-verimag.imag.fr/EVENTS/2006/ATVA/ATVAtutorials/Yannakakis_ATVA2006_Tutorial.pdf. Retrieved 2020-11-03.

“Testing State Machines,” by Matthew Jones. https://accu.org/journals/overload/17/90/jones_1548/. Retrieved 2020-11-03.

Followup – Comments from Readers

Think about “Should Not” and “Cannot” happen transitions. “Should Not” means external system (eg. hamfisted user) can create such an event, we must handle that possibility. “Cannot” means, it cannot, by design, happen. If it does, we have a bug, shout loud and fix fast.
* In review step always look for sinks. (eg. State 4 in your pic) where there is no route away from it. Ask what happens then.
* Look for invariants. Encode an invariant check, and check it on entry to and exit from every state. Then run your test suite.
* Attack with a fuzzer.

If the next state depends on _anything_ other than the current state and current event, you don’t have a state machine, you have spagbol. Pseudo state machines are the bane of my existence.
John Carter

Johannes Link suggested using stateful property-based testing to validate state machines.

Doğan Ulus describes how non-deterministic automata can be matched using a regular-expression-based technique: “Sequential Circuits from Regular Expressions Revisited.”