How Do You Do a Performance Spike?

A performance spike can give you crucial information about your system’s potential performance.

  1. Decide what you want to measure.
  2. Give yourself a deadline.
  3. Sketch a simplified system model and load mix.
  4. Decide which configurations to test.
  5. Run the tests.
  6. Analyze and report the results
  7. Conclusion

1. Decide what you want to measure.

In some cases, you just want to see how quickly you can do something. For example, you might have competing algorithms to assess, or you just want to see how fast you can make something run.

For batch-oriented systems, you often want to measure throughput: items processed per unit of time. For example, a file-based compiler might compile 500K lines of code per minute on a 400 MHz x86 system.

For interactive systems, the focus is both response time and throughput. Response time is how long it takes from start to finish for a single request. For example, a web server might handle 200K requests per hour with sub-second response time.

When you’re testing, you’re well advised to measure system load as well. This makes it easier to predict performance on different systems. For example, if an I/O bound system is loading 10% of the CPU, you might be able to put it on a slower box with almost no performance impact. Without the load measure, it would be hard to know this.
 

2. Give yourself a deadline.

Playing with performance ideas is one of those things that can go on for too long. You’ll never run out of things you could test.

So, give yourself a time budget. Decide that you’re going to find out what you can in 10 minutes, or an hour, or a day, or two. You’ll find that you can learn a lot in a short period of time – you don’t need weeks of experiments.
 

3. Sketch a simplified system model and load mix.

Make a sketch of the major components of your system. Identify all the points where you might want to measure time. For example, suppose you want to evaluate a web system that talks to a database:

Model: client to net, to web server, to database (t1 to t6)

Your system may already log the “time probepoints”, or you may have a profiling tool that can do so. In this example, response time as seen by the client is (t6 – t1). Response time from the server’s perspective (ignoring the network) is (t5 – t2).

You’ll also want to know your load mix: what is the proportion of transactions of each type. In our web server, we might have 75% static pages (that don’t touch the database), and 25% dynamic pages (resulting in a database insert).
 

4. Figure out which configurations to test.

You almost always have a number of options for deploying a system, or tuning it once deployed. You often want to assess design alternatives (“relational or object database?”).

Make a list of your options:

  • 1-10 web server boxes, with
  • 1-4 CPUs each.
  • 10K, 20K, …, 100K transactions/hour
  • Java or C++
  • 1-2 CPUs on the database server
  • etc.

If you had to try all the possible options, you’d be at it a while: just the options above create 10*4*10*2*2=1600 choices.

To the extent possible, you want to make decisions as independently as possible. For example, to decide if Java is “good enough,” you might test it for 1 or 10 CPUs, but not all the combinations in between. Especially when you’re doing early spikes, you can treat the parts of the system independently. In the web example, I might know I can get sub-second response time provided I spend <500 ms in the network, <300 ms in the web server, and <200 ms in the database.

Think of your tests as a scientific experiment. Your goal is to set up a fair test. If two situations are exactly the same other than one thing, you can attribute any differences to that thing. Be careful when you’re changing two options at once. For example, if you compare the system on a web server with 1×500 MHz CPU vs. a server with 4x1GHz CPUs, and you see improved performance, you can’t tell if it was the increase in the number of CPUs, or the increase in speed, or both, that had the effect.

One test I recommend that you try sometime is a series of increasing transaction load, running on two configurations, if possible: one as close as possible to what production will be (so you know what to expect “day one”), and another as close to maximum as possible (so you know the limits of your design). It’s not necessary to have a linear progression; you can test whatever loads work for you (e.g., 10K, 100K, 200K, 500K). You’d like to crank up the load until something breaks.

Finally, you can test adaptively: try the worst case (usually the easiest to set up), and see if it meets the performance budget. If so, you needn’t tune further until you’ve investigated the other variables. This helps reduce your test cases.

So, identify the tests and configurations you need to make your decisions. Design test data that lets you test those alternatives. (In early development, the data will be approximate, e.g., “a query that joins three tables” without a commitment to a particular schema; as your system is developed more, you can use more realistic data.)
 

5. Run the tests.

Start a diary (journal, log) before you test. Record the configuration you’re testing, and the start and stop times for that particular test. While you run the tests, you may also need to run profiling or system monitoring tools. Record all test results. Note any anomalies – even if something happened only once, it may be critical. Performance tests often expose bugs or problems such as memory leaks.

There are three common problems I see in tests:
A. The tests are run on a box that’s busy with other activities. Use a lightly loaded box if possible, so you know you’re measuring only what you want to. If this isn’t possible, you’ll have to measure the “background noise” of what’s already running, and you’ll have to run the test several times to ensure you’re seeing consistent results.

B. The tests are run with the same values each time. For example, if you send the same query several times, you are no longer testing database performance (for most databases). Instead, you’re testing the database’s ability to cache.

C. The configuration changes during the test. This invalidates some of your results and comparisons – you need a fair test.
 

6. Analyze and report the results.

I recommend you produce a simple, 2-page “technical note” rather than a large report. (I stole this idea from Ward Cunningham.) This helps encourage you to keep your tests small and focused.

I recommend including this information:
    Executive Summary / Key Findings
    System Model
    Test Conditions
    Results
    Analysis

Here are three analyses you might make from the types of information we’ve described so far. You can certainly devise others for specific questions.

A. Timings on the simple model.

 

 

Model: client to net, to web server, to database (t1 to t6)

 

 

Notice that
    (t2-t1) + (t6-t5) is total network delay
    (t3-t2) + (t5-t4) is the processing time in the web server
    (t4 – t3) is the database time.
If we saw:
    600 ms – network
        2 ms – processing
    450 ms – database
we’d know that there was no point worrying about processing time if we want to get sub-second response time. (If we reduced processing time to 0 we still wouldn’t hit the goal.)

B. Response time, througput, and load
When you’re measuring response time and throughput, you can often build a table like this:
 

 

 

Throughput Response Time (ms) System Load
Min Avg 90% Max CPU I/O Mem
100 qu/hr 10 20 22 40 10% 25% 10%
500 qu/hr 9 25 38 90 55% 40% 25%
1000 qu/hr 25 140 212 3288 85% 50% 82%

 

 

This chart gives lots of information about what’s going on. Notice the 90% column: it tells us the response time that 90% of the transactions beat. We see that we’re pretty comfortable about promising 100 ms response time for 500 queries/hour or fewer. Notice the CPU and memory load at 1000 queries/hour: they’re above the 80% threshold level where we start to expect thrashing. (The response time reflects this.) You can interpolate load values to predict behavior at lower throughputs.

If you plot response time versus throughput, you’ll often see a picture like this:

 

 

Response time increases exponential over 80% load

 

 

In this example, response time is more or less linear for loads below 80%, but shoots way up once that mark is reached. (This is just a rule of thumb; you might hit the problem at 75% or 90% instead.)

C. Load mix analysis
Suppose you’ve tested one query type at a time, and you see this:
 

 

 

Query CPU I/O Memory
query 1 12% 80% 28%
query 2 96%  8% 12%

 

 

If the expected load mix is 75% query 1, and 25% query 2, you can estimate the load. In this case, CPU will be .75*12% + .25*96% = 9% + 24% = 33%. The full table then looks like this:
 

 

 

Query CPU I/O Memory
query 1 12% 80% 28%
query 2 96%  8% 12%
75% q1 + 25% q2 33% 62% 30%

 

 

If we had to deal with query 1 or query 2 only, we’d expect problems (from the 80% load level). When we consider them together, the system as a whole is OK.
 

In general:
Be conservative in your analysis. If you see one application server handling 50K transactions/hour, that doesn’t mean 4 application servers can handle 200K transactions/hour. (They might interfere with each other – you can’t tell without some sort of scalability test that tries many application servers.) Conservatively, you can say that you wouldn’t expect to exceed 200K transactions/hour.

7. Conclusion

A performance spike can help you assess stories during the release planning phase. When the customer writes a performance-centered story, or when you’re evaluating design alternatives, you can run some quick tests to decide if a system is feasible, or to choose between designs. In this way, the spike helps you establish your architecture.

[October 4, 2000.]