Keirin Microbenchmarking Library

Keirin

UCI_Track_World_Championships_2020-02-29

As part of my research for the Clojure: The Essential Reference, I looked into the performance of the Clojure vector. From a theoretical perspective, creating a vector of n elements was O(n log n). However, the received wisdom was that in practice the Clojure vector’s performance was linear, similar to a resizable array in other programming languages.

I set about measuring the performance of vector using Criterium, a microbenchmarking library for Clojure, and got some unexpected results.

The performance starts out as linear, but then as n increases it becomes quadratic. I started to look into it…

Compilations and garbage collections

The problem was garbage collections (‘GCs’). Criterium tries to reduce the impact of GCs, by trying to “force” a garbage collection before the payload function, the function under test. However, the JVM gives no guarantees that it will perform a GC when one is requested and no guarantees that it will run a function without performing GCs.

A further problem is that if the results of the payload function are disregarded, then the JVM can optimise away calls to the payload function altogether. Criterium runs the payload function multiple times and to avoid the payload function being optimised out, holds on to the output from the payload from each run. But if the payload function allocates lots of memory, then this puts pressure on the garbage collection making it more likely that there will be a garbage collection.

The insight I had, which led to Keirin, was a GC could not be prevented, but if one occurred this could be detected and the timed run discarded. If appropriate diagnostics are enabled, then the JVM will write to a log file every time a GC occurs. If you sample the log file before and after the run of the payload function and no new GCs have been written to the file then there is a clean run. On the other hand, if a GC has been written to the file then it may have occurred during execution of the payload function. In this case the run might have been impacted, so the result is discarded.

A lesser, but potential problem is compilation and class loading. Keirin has a warm up period where it executes the payload function repeatedly to encourage the JVM to compile the byte code and also load any needed classes. However, it is possible that the JVM will choose to compile byte code or load a class within a timed run. If this happens, Keirin again detects this and discards the run.

To ensure the least possible pressure is put on the garbage collector, Keirin only holds on to a hash of the output of the payload function rather than the payload function itself.

Timing loop

Criterium is designed to time functions that sometimes run in only a few nanoseconds. However, the timers provided by the JVM, despite their name, are not nanosecond accurate. To work around this Criterium times multiple evocations of the payload function and divides this by the number of invocations of the function.

This however introduces a new problem. The timing loop that repeatedly runs the payload function introduces its own overhead. So on start-up Criterium estimates the overhead and subtracts this from the timed run.

This also introduces problems. As the README.md file for Criterium explains, if the machine is busy at the point that Criterium calculates the overhead it might overestimate the overhead and subsequent attempts to bench mark quick functions could return negative times.

The other problem is that the overhead is not constant. It is a mathematical function of the number of times the payload function is executed. And this depends on how fast the payload function is. The faster the payload function is, the more times it needs to be executed to ensure that the inaccuracies in the timing are amortised away.

There is no perfect solution to this problem, but Keirin takes a slightly different approach. Keirin always includes the timing overhead in the times returned. This ensures that the times returned are consistent. Further, if the payload function is not in the nanosecond range the overhead can just be ignored. On the other hand, if it is in the nano second range, then there is an option to dynamically calculate the overhead.

Whilst testing Keirin, I noticed that the more times the payload function was executed, the faster it appears to run. The JVM detects “hotspots”, parts of a program that are repeatedly executed, and applies further optimisations. My hypothesis is that the payload function was detected as a hotspot after a certain number of iterations and then further optimised; hence the apparent speed up. As a result, I changed Keirin to run the timing loop twice and throw away the first time. This gives the JVM an opportunity to apply hotspot optimisations before the timed run. With this change, in my testing Keirin reports the same execution time even as the number of times the payload function is executed is increased.

Statistics

Keirin takes a different approach to statistics than Criterium. Criterium performs multiple timed runs and then takes the sample mean and sample standard deviation. The author of Criterium recognised that both the mean and std are impacted by outliers and uses bootstrapping to calculate the impact of these outliers on the results.

However, I prefer to makes as few assumptions as possible about the underlying probability distribution of the numbers. I feel uncomfortable assuming that the mean is well defined and that the variance is finite. Instead Keirin uses simple statistics, the median and median absolute deviation (‘MAD’), which are robust to outliers.

Final thoughts

Using Keirin I was able to see how the Clojure vector actually performs and the received wisdom is correct – performance is linear!

Writing an accurate and consistent benchmarking library for the JVM is hard due to the lack of accurate timers and the dynamic nature of the JVM! There is no perfect or canonical way to approach the problem, but I hope that Keirin will help contribute to the existing body of work in this area. Keirin is available from GitHub here.

Picture of a cyclist courtesy of Tim Rademacher. Used under the Creative Commons CC BY-SA 4.0 licence.