# A Plea For Programs

[Update 9/21/2008: I’ve got a simple sample program for people to port, plus I’ve got at least some code for – Clojure, JavaScript/Rhino, JPC & JRuby; missing Scala at least – thanks].

I would like some non-Java Java-bytecode programs to do performance testing, for a talk I’m giving this coming Friday (my bad for starting this late) and I’m hoping my gentle readers can supply some.  I’d like programs in different languages, but ones that are easy to setup and run.  I’m going to do internal JVM profiling, so I’m not all that concerned with the output or “Foo-per-second” results.  Ideally, my programs would be:

• Non-Java.  Clojure, Scala, JPython, JRuby all come to mind.  The more variation, the merrier!
• Easy setup.  I’m not an expert in any of these, so the resulting program has to be easy to setup and run.  Perferably a simple “java -cp Weirdo.jar FunnyProgram” command line.
• Plain JVM.  Note that the ‘java’ command has to be there; I intend to use Azul Systems’ JVM for profiling and we have our own.  Any kind of odd-ball jar or class files should be fine.
• Long enough.  The program has to run for several minutes at least, without “babysitting”.  Long enough for the JIT to settle down (if it’s going to), and long enough for decent profiling.
• Little I/O.  Besides DBs being a pain to setup, I’m really looking for CPU-bound programs.  Plain file I/O is fine, if the files are provided and can be scripted easily (e.g. “java -cp Weirdo.jar FunnyProgram < BigInput.dat > /dev/null”).
• Be multi-threaded.  Not a requirement, but a definite nice-to-have.  Several of these languages support alternative threading & coherency models and I’d like to test these features.
• Be Open Source, so I can post the collection for others to compare against.  This is NOT a hard requirement; I’m all fine with keeping private anything you request be kept private.  Performance profiling data will be released, as that is what the talk is about!  (I’m also fine with signing NDA’s but that’s probably not going to be an issue with this crowd).
• An example: A multi-threaded Mandelbrot program would be fine, computing a 1000×1000 grid of points centered around (1.0,1.0) with a spread of (1.0,1.0) – so fill in the grid (0.5,0.5) to (1.5,1.5), using your choice of thread controls.
• Please include any names, so I can give credit where credit is due.

I hope to discover things like:

• How close does “plain code” match the JVM/JIT’s expectations?  How well does the JIT turn “plain code” into machine instructions?  I hope to present the JIT’d code for sample language constructs and detailed profiling data.
• How well does the function-call logic match the JVM/JIT’s expectations?  Can trivial functions be inlined?  What’s the cost of a not-inlined function-call?
• Other interesting costs?  (e.g., endless new-Class churning, endless new-bytecode churning causing endless JIT’ing; endless new weak-ref or finalizer creation causing GC grief, etc)
• How well does the alternative threading & coherency scale?  Can Mandelbrot run on a thousand CPUs?  (I expect: trivially yes). How about programs with more interesting coherency requirements?

I put a sample Java program here, if you’d like to port something really simple.  The inner loop of this program looks like: “for( i=0; i<1000000; i++ ) { sum += ((int)(sum^i)/i); }”.  The JIT’d assembly code from HotSpot’s server compiler looks like this, unrolled a few times:

 2.83% 243 0x12d93878 add4      r5, r4, 1 // tmp=i+1; unrolled 8 times, this is #1  0.06% 5 0x12d9387c xor       r3, r5, r1 // sum in r1, tmp in r5  0.06% 5 0x12d93880 beq       r5, 0, 0x012d93b40 // zero check before divide              0.35% 30 0x12d93884 div4      r0, r3, r5 // divide, notice cycles on next op       2.64% 227 0x12d93888 add4      r1, r0, r1  | // sum += (sum ^ tmp)/tmp

As expected, there’s a pretty direct mapping from the source code to the machine code.  I’d like to see how other JVM-based languages stack up here. Email me directly with small programs, or post links here.

Thanks!
Cliff