Time Troubles – Binary Timer Overflow

A fixed-size counter (triggered by some sort of pulse) is a common way to represent time.


When you use this approach, you generally have to answer three questions:

  1. What moment does “0” represent? (Sometimes called the “epoch”.)
  2. What time interval does adding 1 represent? (Are you counting milliseconds, seconds, or what?)
  3. How many bits are in the counter?

Answering those questions lets you answer several other questions: (For all these, make sure your units line up)

  • Q: What’s the representation of the current time? A: (t – epoch) / interval
  • Q: What time does this timer count represent? A: epoch + n * interval
  • Q: When will the timer run out of room? A: epoch + max * interval (where max is the largest counter value)

Examples

There are a number of examples where overflow affected (or will affect) systems:

  • Windows 98 consistently crashed after 49.7 days of uptime (2^32 msec). (API call GetTickCount() returns 32-bit milliseconds of uptime.)
  • Boeing 787s shut down after 248 days: 2^31 * 0.1 sec.
  • Airbus 350s require a reboot every 149 hours. (I didn’t find a source saying exactly what the overflow was.)
  • Deep Impact 2013: The Deep Impact space probe, used to study comet Tempo 1, was believed to have been lost because its timer counter overflowed after 2^32 * 0.1 sec. Starting in 2000, it ran out on 11 August 2013.
  • Anonymous tracking system: Called 16-bit time functions (unintentionally) on Windows to measure the duration of a task, causing it to compute task duration wrong, and crash when run overnight.
  • Y2036 Problem: NTP (Network Time Protocol, that allows computers to synchronize their clocks) uses a 32-bit unsigned counter of seconds since 1 January 1900, and will run out in 2036.
  • Y2038 Problem: Many Unix systems represent time as a signed 32-bit count of the number of seconds since 1 January 1970, and will run out on 19 January 2038.
  • Y292,277,026,596 Problem: Systems that use the Unix scheme with a 64-bit count will overflow. This sounds far away, but note that problems may occur in years preceding, e.g., retirement calculations or long-term mortgages may need to project beyond that date.

Detecting Timer Overflow

How will you know a timer overflow occurred? It’s a tricky problem.

Before Runtime

  • Analytic: When you set a starting point for your timer, a time interval, and a fixed counter size, you’ve effectively decided the maximum time you can represent. Can your system run into these limits?
  • Hardware Review and Code Review: Systematic human review, from a time management perspective, may be able to identify whether a timer can (or will be) properly managed, or at least identify that it is not.
  • Tools: Compilers, linters, and special-purpose tools may be able to identify code that (potentially) doesn’t handle timer overflow.

Built-In Detection

Detection isn’t the whole story; you also need to address the problem.

  • Exceptions: Some languages or hardware provide an exception when an overflow occurs for signed and/or unsigned values.
  • Flags – Timer overflow, arithmetic overflow, or carry: Many processors can detect when incrementing or adding to a value pushes it out of range.
  • Timer Overflow Interrupt: Some systems (e.g., Arduino) provide an interrupt when a timer overflows, where you can install a handler to detect and address it.
  • Check for Wrapping: In some languages, you can tell when a timer has wrapped because the value decreases instead of increases. This may require that you don’t wait too long between checking values – if it could wrap twice, or wrap beyond the previous value, you won’t be able to tell.

Run-Time Detection

  • Soak Test: Run the system at load for a “long” period of time, hopefully long enough to give any timer-related problems long enough to reveal themselves.
  • Debug Crashes, esp. “Cyclic” Crashes: When a failure or crash occurs “randomly”, it’s worth trying to see if it is really occurring a consistent amount of time after some event. For example, if the app consistently crashes 18.2 hours after it starts, you might look for a 16-bit counter of seconds.

Solutions

Timer Scaling: Some systems let you explicitly control the interval. For example, you can choose whether your timer has a one-millisecond or a one-second “tick”. You may be able to modify the design to deal with a larger tick (thus increasing the time before the timer runs out).

Timer Extension: Keep a second (or beyond) counter, incremented when the first overflows. Think of it as a “carry”. You still have to decide where this extra space lives and how it’s managed. And you may have extra synchronization problems. (See Lamport article in the references.)

Timer Expansion: Get a bigger boat: use a counter with more bits. Note that this can cause legacy problems: input, storage, output, APIs, cross-system compatibility, etc.

Change the Starting Point: Use a later starting point to give you a later rollover time. For example, you were thinking of using Unix’ 1-Jan-1970 “epoch”, but your system deploys in 2020 or later, so you use 1-Jan-2020 instead. (Note that this is asking for trouble if you have to interact with 1970-based dates.)

Timer Conversion Charts

“𝛑 seconds is a nanocentury.” – Tom Duff (from More Programming Pearls)

Since time is measured in a mix of base-10 and base-60 values, with a few 12s and 24s thrown in, it’s hard to relate a time that’s a power of 2 or power of 10 bigger. These charts help you do that: you can easily see that a 16-bit count of seconds is 18.2 hours, but a 32-bit count is 136.2 years.

For this table, I used Google’s time converter, where 1 day is 86400 sec, and 1 year is 365 days. I rounded most numbers. Verify any critical numbers before you use them.

Time Conversion Chart – ns to sec

bits 1 nsec1 μsec1 ms1 sec
71.3e-7 sec1.3e-4 sec0.13 sec2.1 min
82.6e-7 sec2.6e-4 sec0.26 sec4.3 min
153.3e-5 sec0.033 sec32.8 sec9.1 hours
166.6e-5 sec0.066 sec1.1 min18.2 hours
238.4e-3 sec8.4 sec2.3 hours97.1 days
241.7e-2 sec16.8 sec4.7 hours194.2 days
312.1 sec35.8 min24.9 days68.1 years
324.3 sec71.6 min49.7 days136.2 years
63292.4 years2.92e5 years2.92e8 years2.92e11 years
64584.9 years5.85e5 years5.85e8 years5.85e11 years

Time Conversion Chart – ms to sec

bits .001 s (=1 ms).01 s.1 ssec
70.1 sec1.3 sec12.7 sec2.1 min
80.26 sec2.6 sec25.6 sec4.3 min
1532.8 sec5.5 min54.6 min9.1 hours
161.1 min10.9 min1.8 hours18.2 hours
232.3 hours23.3 hours9.7 days97.1 days
244.7 hours1.9 days19.4 days194.2 days
3124.9 days248.6 days6.8 years68.1 years
3249.7 days1.4 years13.6 years136.2 years
632.92e8 years2.92e9 years2.92e10 years2.92e11 years
645.85e8 years5.85e9 years5.85e10 years5.85e11 years

Time Conversion Chart – Seconds to Years

bits  secmindayyear
72.1 min2.1 hours0.35 years127 years
84.3 min4.3 hours0.7 years255 years
159.1 hours22.8 days89.8 years32,767 years
1618.2 hours45.5 days179.5 years65,535 years
2397.1 days16.0 years22,982.5 years8.4e6 years
24194.2 days31.9 years45,965.0 years1.7e7 years
3168.1 years4,085.8 years5.9e6 years2.1e9 years
32136.2 years8,171.6 years1.2e7 years4.3e9 years
632.92e11 years1.75e13 years2.5e16 years9.2e18 years
645.85e11 years3.51e13 years5.05e16 years1.8e19 years

Acknowledgments

Thanks to Lisa Crispin and Mat Bess for reviewing an earlier draft. (Errors and shortcomings are mine, of course.)

References