Categories
benchmarking java jvm jvm-hotspot microbenchmark

How do I write a correct micro-benchmark in Java?

951

How do you write (and run) a correct micro-benchmark in Java?

I’m looking for some code samples and comments illustrating various things to think about.

Example: Should the benchmark measure time/iteration or iterations/time, and why?

Related: Is stopwatch benchmarking acceptable?

7

848

Tips about writing micro benchmarks from the creators of Java HotSpot:

Rule 0: Read a reputable paper on JVMs and micro-benchmarking. A good one is Brian Goetz, 2005. Do not expect too much from micro-benchmarks; they measure only a limited range of JVM performance characteristics.

Rule 1: Always include a warmup phase which runs your test kernel all the way through, enough to trigger all initializations and compilations before timing phase(s). (Fewer iterations is OK on the warmup phase. The rule of thumb is several tens of thousands of inner loop iterations.)

Rule 2: Always run with -XX:+PrintCompilation, -verbose:gc, etc., so you can verify that the compiler and other parts of the JVM are not doing unexpected work during your timing phase.

Rule 2.1: Print messages at the beginning and end of timing and warmup phases, so you can verify that there is no output from Rule 2 during the timing phase.

Rule 3: Be aware of the difference between -client and -server, and OSR and regular compilations. The -XX:+PrintCompilation flag reports OSR compilations with an at-sign to denote the non-initial entry point, for example: Trouble$1::run @ 2 (41 bytes). Prefer server to client, and regular to OSR, if you are after best performance.

Rule 4: Be aware of initialization effects. Do not print for the first time during your timing phase, since printing loads and initializes classes. Do not load new classes outside of the warmup phase (or final reporting phase), unless you are testing class loading specifically (and in that case load only the test classes). Rule 2 is your first line of defense against such effects.

Rule 5: Be aware of deoptimization and recompilation effects. Do not take any code path for the first time in the timing phase, because the compiler may junk and recompile the code, based on an earlier optimistic assumption that the path was not going to be used at all. Rule 2 is your first line of defense against such effects.

Rule 6: Use appropriate tools to read the compiler’s mind, and expect to be surprised by the code it produces. Inspect the code yourself before forming theories about what makes something faster or slower.

Rule 7: Reduce noise in your measurements. Run your benchmark on a quiet machine, and run it several times, discarding outliers. Use -Xbatch to serialize the compiler with the application, and consider setting -XX:CICompilerCount=1 to prevent the compiler from running in parallel with itself. Try your best to reduce GC overhead, set Xmx(large enough) equals Xms and use UseEpsilonGC if it is available.

Rule 8: Use a library for your benchmark as it is probably more efficient and was already debugged for this sole purpose. Such as JMH, Caliper or Bill and Paul’s Excellent UCSD Benchmarks for Java.

6

  • 5

    This was also an interesting article: ibm.com/developerworks/java/library/j-jtp12214

    Jul 10, 2010 at 22:29

  • 159

    Also, never use System.currentTimeMillis() unless you are OK with + or – 15 ms accuracy, which is typical on most OS + JVM combinations. Use System.nanoTime() instead.

    Apr 22, 2011 at 18:43


  • 5

    Some paper from javaOne: azulsystems.com/events/javaone_2009/session/…

    – bestsss

    Jun 5, 2011 at 12:29

  • 105

    It should be noted that System.nanoTime() is not guaranteed to be more accurate than System.currentTimeMillis(). It is only guaranteed to be at least as accurate. It usually is substantially more accurate, however.

    – Gravity

    Jul 27, 2011 at 8:00


  • 48

    The main reason why one must use System.nanoTime() instead of System.currentTimeMillis() is that the former is guaranteed to be monotonically increasing. Subtracting the values returned two currentTimeMillis invocations can actually give negative results, possibly because the system time was adjusted by some NTP daemon.

    – Waldheinz

    Mar 16, 2015 at 10:51

255

I know this question has been marked as answered but I wanted to mention two libraries that help us to write micro benchmarks

Caliper from Google

Getting started tutorials

  1. http://codingjunkie.net/micro-benchmarking-with-caliper/
  2. http://vertexlabs.co.uk/blog/caliper

JMH from OpenJDK

Getting started tutorials

  1. Avoiding Benchmarking Pitfalls on the JVM
  2. Using JMH for Java Microbenchmarking
  3. Introduction to JMH

2

93

Important things for Java benchmarks are:

  • Warm up the JIT first by running the code several times before timing it
  • Make sure you run it for long enough to be able to measure the results in seconds or (better) tens of seconds
  • While you can’t call System.gc() between iterations, it’s a good idea to run it between tests, so that each test will hopefully get a “clean” memory space to work with. (Yes, gc() is more of a hint than a guarantee, but it’s very likely that it really will garbage collect in my experience.)
  • I like to display iterations and time, and a score of time/iteration which can be scaled such that the “best” algorithm gets a score of 1.0 and others are scored in a relative fashion. This means you can run all algorithms for a longish time, varying both number of iterations and time, but still getting comparable results.

I’m just in the process of blogging about the design of a benchmarking framework in .NET. I’ve got a couple of earlier posts which may be able to give you some ideas – not everything will be appropriate, of course, but some of it may be.

9

  • 3

    Minor nitpick: IMO “so that each test gets” should be “so that each test might get” since the former gives the impression that calling gc always frees up unused memory.

    Apr 20, 2013 at 6:52


  • 1

    @SanjayT.Sharma: Well, the intention is that it actually does. While it’s not strictly guaranteed, it’s actually a pretty strong hint. Will edit to be clearer.

    – Jon Skeet

    Apr 20, 2013 at 8:02

  • 1

    I don’t agree with calling System.gc(). It is a hint, that’s all. Not even “it will hopefully do something”. You should never ever call it. This is programming, not art.

    Jun 14, 2013 at 10:38

  • 16

    @gyabraham: Yes, it’s a hint – but it’s one which I’ve observed to usually be taken. So if you don’t like using System.gc(), how do you propose to minimize garbage collection in one test due to objects created in previous tests? I’m pragmatic, not dogmatic.

    – Jon Skeet

    Jun 14, 2013 at 10:58

  • 11

    @gyabraham: I don’t know what you mean by “great fallback”. Can you elaborate, and again – do you have a proposal to give better results? I did explicitly say that it’s not a guarantee…

    – Jon Skeet

    Jun 14, 2013 at 17:44