How to stop a compiler

Suppose I want to write a microbench to test something.  Here’s a for-instance:

  // Time sqrt
  long start = System.currentTimeMillis();
  for( int i=0; i<1000000; i++ )
    Math.sqrt(i);
  long stop  = System.currentTimeMillis();
  System.out.println("sqrts/sec="+1000000.0/((double)((stop-start)/1000)));

I run this and get:

sqrts/sec=Infinity

Urr… what?  Ah!  My fast X86 can do so many sqrts in a second, that “stop-start” is less than 1000 milliseconds, then “(stop-start)/1000” rounds down to zero as integer math.  Try again:

  // Time sqrt, Round 2
  long start = System.currentTimeMillis();
  for( int i=0; i<1000000; i++ )
    Math.sqrt(i);
  long stop  = System.currentTimeMillis();
  System.out.println("sqrts/sec="
                     +1000000.0/(((double)(stop-start))/1000.0));

This time I get:

  sqrts/sec=2.5E8

Ahhh… a smug smile, that X86 is really fast.  It’s a 2.5Ghz Xeon, so that’s… lets see… (2.5e9 clks/sec) / (2.5e8 sqrts/sec) = 10 clks/sqrt.  Humm… that IS really fast.  Let’s raise the bar a little: let’s try 10 million sqrts instead of 1 million:

  // Time sqrt, Round 3
  long start = System.currentTimeMillis();
  for( int i=0; i<10000000; i++ )
    Math.sqrt(i);
  long stop  = System.currentTimeMillis();
  System.out.println("sqrts/sec="
                     +10000000.0/(((double)(stop-start))/1000.0));

This time I get:

  sqrts/sec=4.5454545454545456E8

Yeek!  TWO alarm bells go off in my head!

First Alarm: 10x more work but only about 2x more time; now I’m down to 5 clks/sqrt!

Second Alarm: that repeating fraction result: it tells me I’ve likely got a REALLY small number of milliseconds that I’m dividing.  In fact… it’s 22 milliseconds.  Very suspicious!  Yup, the compiler is tossing out my inner loop as being dead.  To confirm, I’ll switch to 1billion sqrts.  This time I get:

  sqrts/sec=3.846153846153846E10

That’s roughly 15 sqrts PER CLOCK CYCLE – not 15 clks/sqrt.  Yup, that’s one speed-demon Xeon.  Or a brainiac compiler.  To defeat the compiler I need to use all the results of the inner loop in the final output.  Here’s one way (and notice I’m back to 1million iterations):

  // Time sqrt, Round 4
  long start = System.currentTimeMillis();
  double sum = 0.0;  
  for( int i=0; i<1000000; i++ )
    sum += Math.sqrt(i);  // use the results in some cheap way
  long stop  = System.currentTimeMillis();
  if( sum == 1234567.0 )  // final usage of ALL results
    System.out.print("woohoo!"); 
  System.out.println("sqrts/sec="
                     +1000000.0/(((double)(stop-start))/1000.0));

Now I’m measuring 1 sqrt and 1 double-precision addition per iteration… but I know the add is fairly fast, or least it’s fast relative to square root.  This time I get:

sqrts/sec=7692307.692307692

Ahhh… about 7.7million sqrts/sec, or 325 clks/sqrt.  I know double-precision add is much less than this, so the extra add does not futz with my result numbers.  And no “woohoo!” in the output.  In fact, it’s vanishingly rare that I’ll get an exact match and get my output polluted.  And I finally got some sane numbers out.

Ok, really I want to time various HashTable.get/put implementations but the basic issues are similar.  I’ve got multi-threaded scaling issues to work out as well.  Meanwhile, here’s my 2002 JavaOne slides on the topic:

http://www.azulsystems.com/events/javaone_2002/microbenchmarks.pdf

Cliff

P.S. A

Published by

wpengine

This is the "wpengine" admin user that our staff uses to gain access to your admin area to provide support and troubleshooting. It can only be accessed by a button in our secure log that auto generates a password and dumps that password after the staff member has logged in. We have taken extreme measures to ensure that our own user is not going to be misused to harm any of our clients sites.