2009 ECOOP

Long delayed, but at last I found time to publish my notes.

I got a free all-expense paid trip to Italy, and all I had to do was give a keynote speech (slides) at 2009 ECOOP.  It’s a nice gig if you can get it.  Travel to Genoa from San Francisco takes awhile; my 8:30am SF flight got me in Genoa about 2pm on the next day – plus 2 hrs lead time at SF, plus a 1hr drive to SF – I started my “day” at 6am on Friday and ended it at 8pm on Saturday.  I managed to stay up till dinner (barely) then crashed pretty hard.  Sunday was my day for sightseeing; I managed to wander pretty far all around Genoa; rode some funiculars; walked all over; took loads of pictures; ate weird food and generally acted like a typical American tourist.

There have been loads of sidewalk vendors; all Blacks aggressively selling fake designer purses and sunglasses laid out on sheets on the sidewalk.  Monday afternoon I got to watch a bust go down; lots of yelling, they grabbed their sheet bundles and took off running with plain clothes police in hot pursuit.  Somebody lost his wares and the police quit chasing to grab the loot.  It’s Tuesday evening and the Blacks haven’t returned.  I promised my wife I’d get her a cheap Gucci purse and now I’m regretting I didn’t do it earlier.

Hotel Bristol is a nice older place (means: lots of paint chips and paint layers, big chandeliers, gold leaf trim, dark wood, creaky old elevators; rooms are a mix of really old and really new; follow the link and oogle at the pictures).  My room is vast with 15ft ceilings (with chandelier of course) and a large jacuzzi tub; enough room to put up a basketball hoop.  The conference food has been pretty nice (free good wine with every meal); lunches are served in the Palace Ducalle – historic summer residence of Doges.  I can’t do the setting enough justice; the website is nice but the pictures don’t really tell the story.  50ft ceiling with 20ft chandeliers and a dozen 8ft marble statues; gold leaf trim on every fresco; stunning paintings the size of basketball courts…  I totally laid down on the floor and stared at the ceiling for a good hour.  The local restaurants, on the other hand, are a hit-or-miss affair; food is only served at certain hours; sometimes the food has been very good and sometimes no better than the local deli around the corner from Azul.

The ICOOOLPS workshop was on Monday; I got invited to talk there at the last minute.  I had some misgivings but most of the papers at the workshop were pretty nice; I enjoyed myself.  I’m still fighting jet-lag (today is Tuesday) pretty hard, so I’m writing this to try and stay awake until dinner.  The papers are here.

1st up – The ICOOOLPS workshop.  ECOOP notes are further down.  As usual I abbreviate ruthlessly; skip some talks; take notes in a stream-of-consciousness style.  Caveat Emptor.  And before I forget:

The “Nick Mitchell SimpleDateFormat Challenge”: parse a sample common SDF and JIT code to translate Date objects to ASCII efficiently.  Similar to C/C++ compiler optimizations for ‘printf’ strings.

Let me know when you got something working.

Towards an Actor-based Concurrent Machine Model

“delegation based” machine model; dynamic separation of concerns “MDSOC”

Machine model: objects, msgs, delegation.  High lvl objects represented as at least 2 low-level objects; a “proxy” and a real object.  Indirection to allow message interception.  All msg-sends get forwarded thru the proxy (“message” here is a java-lvl function call, not an OpenMPI msg – but could be either in a distributed system).

Aspect-Oriented-Programming – intercept msg-send links (between Class-proxy and methods in the actual Class-body), etc.  How to do CHA with basically dynamic sub-class insertion?  Just re-JIT in the new class hierarchy?

Ok – now new stuff… want concurrent in the delegation function.  Actor is a collection of local objects (and ptrs to remote objects).  Fcn calls between local objects are fast; fcn calls across actors are basically remote-msg send. Then receiving a msg invokes a co-routine in the remote actor.

Ugh, using ‘yield’ with the co-routines.  Sounds buggy to me: semantics of dead-lock will depend on proper calling ‘yield’.

(assuming a VM w/ actors not thread support)


An Efficient Lock-Aware Transactional Memory Implementation
Justin Gottschlich

Trying to integrate into the “boost” STM system.

Locks+TM break things.
But locks are prevalent – must be able to compose with them.

Example: Locks outside Transaction
– T1 runs lock; T2 runs transaction
Example: swap code.  Works with only-locks or only-TMs but breaks if mixed

Locks inside Trans
– T1 runs trans, calls lock; T2 runs either trans or lock

Prior work: “full lock protection” – if you take a lock, then the XTNs are all stopped/aborted/etc and you only get locking behavior.

Offers a programmer solution: programmer lists lock/XTN conflicts and the XTN system deals when you take a conflicting lock.  (I believe this is unrealistic).


Eric Jul? – Whiteboard not slides….

Blurring the line between compilers & runtimes

Gives some examples already done by hotspot

Nice discussion, but nothing new.  Mostly trying to get a discussion going about crosing compilers/JITs & runtimes.  He might not have been aware about what the JIT already does.


Trace-Based Type Specialization in JavaScript
Andreas Gal

Same basic talk as given at PLDI.  TraceMonkey? FireFox 3.5
JavaScript & Flash instead of Java/C/C++

JS has been very slow (interpreted only).  But is here to stay; very popular & growing.  ActiveX & client-side Java dying out (not sure about client-side Java which was never popular in the 1st place).

Static typing makes life easy, but dynamic typing is required.  More complex data types and more runtime tests.  Tag bits to be checked; overflow to Double, etc.

Coming around to wanting a HotSpot like dynamic JIT’ing thing based on types that happen to be true at the moment.  Basically, types in program traces remain stable over time.

  – for( var i=0; i<100; ++i ) { /* nothing */}

loop:
if ( int(i) >= 100 ) break;
i = box(int(i)+1);
goto loop;

So trace can record, e.g. that ‘i’ has always been an ‘int’ so far.  Trace has a guard on the input types, and are type-specific.  Function calls are inlined in the traces, along with guard statements to check that you are taking the same control-flow.

Tracing loops, and can verify trace is type-stable across loop.  So can remove e.g. boxing in the loop.

But real traces have many exits and the many exits are really taken.  So trace along each exit.  Build trace-trees, rooted at loop headers (exponential growth of trace trees as the loop DAG is split out?).  Can only link back into original tree if types all match up – else need a new loop /tree header.

Suffering for lack of a language spec; JS programs are driving the language semantics (i.e., w/a new interpreter some JS programs fail so we call the interpreter buggy- even if it meets the loose “spec”).  No nothing of a memory model or threading; lots of other holes in the language.  Echo’s of what Java went through: the popular implementations defined the spec.


Tracing the Meta-Level: PyPy’s Tracing JIT Compiler
Carl Bolz

PyPy is a tracing JIT compiler.  Now apply this tracing JIT to an interpreter.
Compiler “Restricted Python” RPython – can target C, Java, .Net
Various interpreters: python, prolog, smalltalk, scheme, etc

Basic idea:
trace loops; look for type-stable loop execution; look for similar code path loops.

But the dispatch loop in interpreters means you never execute the same loop code twice (because each time you are running a new different bytecode).  Goal: trace user-mode program, not the language interpreter.  Effective the tracing interpreter unrolls the bytecode dispatch loop.  Provde 3 hints to laguange interpreter.  – hint for position key; here is the language interpreter’s BCI; here are backwards edges; here is the PC modification

Works fairly well to clean out the language interpreter from the traces.
Then get traces which are fairly clean & can JIT them.

Bears resemblance to partial evaluation, arrived at by different means.  Future work: better optimization of traces; some escape analysis to remove boxing operations.  Optimize frame objects, apply to larger programs.


Faster than C#: efficient implementation of dynamic languages on .NET
Antonio Cuni

Trying to make e.g. Python faster on .Net.
Looking at IronPython, Jython, JRuby, Groovy
(also Self, JavaScript/TraceMonkey/V8)

Why so slow?  Hard to compile efficiently; lack of type info at runtime; VMs not optimized to run them.  .Net is a multi-language VM, right (sure, as long as the language is C# – his quote, not mine!).  JVM is in better shape, but still heavily optimized for Java.

JIT compiler?  Wait until you know what you need; interleave compile-time and runtime; exploit runtime info.  JIT layering; fight the VM…

PyPy; JIT compiler generator; Python semantics “for free”.  JIT frontend not limited to python; JIT backend: x86 or CLI/.NET backends.  Fun games with partial eval: assume e.g. Python bytecodes to be constant & constant-prop them into the python interpreter.

Do even more fun with constants than HotSpot+OnStackReplacement – totally doing speculative constant value JIT’ing – if this argument is of value ‘3’ then here is the JIT’d code.  Trick is to pick which variable to constant- speculate on (and getting that spec value as well).

Not yet doing all of Python, but getting really great speedups.


Strata Virtual Machine

Software Dynamic Translation – read a binary, & translate it/jit the translated code.

Using a code-based hash-table lookup.  Hash; jump to hash-table entry.  Miss: jump to ‘strata’ interpreter
Hit: jump to bucket; bucket checks target (like HS inline cache, check at target)

HotSpot could use this to make more efficient v-calls?  Hash; indirect jump to nearby code table; table jumps to target & checks target; expect no misses after warmup.  Nah… still got indirect branch in there.

Looks like a standard binary dynamic translation type stuff (converting PPC instructions to Java bytecodes?)


Automatic Vectorization in JIKES RVM

Using SIMD ops on X86.  Can’t use BURS/BURG to pattern-match vector ops.  Unroll loops to make the patterns more obvious.

Basically getting some SIMD stuff to work (but talk given by PC not by author, and PC believes this is not the right way to discover SIMD ops).  I also believe this… it’s not a clean fit to BURS.

Code emit via simple bitmask/shift/and/or.


JIT Compilation on ARM Processors
Michele Tartara

ARM – 31 32-bit GPRs, 16 available at a time?.  SP, LNK, PC are GPRs.  Fixed format 32-bit ops.  Dynamic fast compilation (cell-phone targets).  No BURS or tiling, just greedy rules.  This is probably the right way to go always.


ECOOP 2009 Proper Starts

ECOOP is being held in the Piazzo Ducall (Ducal Palace) – the main conference presentation chamber has perhaps 40ft ceilings, acres of gold leaf on the walls in between the massive medieval paintings.  The speaker dias was clearly meant to hold a throne or the orchestra; it’s a circular marble dias perhaps 20ft in diameter with marble balustrade and marble railings.  The high tech screen & projector setup in the middle is really anachronistic.

The adjacent formal ballroom is much larger; 50ft+ ceilings; chandeliers of at least 20ft tall and 30ft in diameter; paintings & gold leaf in plenty plus also perhaps a dozen 8ft marble statues with ancient greek themes.


Keynote – Classes, Jim, but Not as we Know Them
Simon P Jones

As usual for Simon, he gave a wonderful presentation.

History – Haskell is 20yrs old.  Lots of fun with new languages & machines & ideas (Functional Programming).  After Backus’s Turing Award lecture opened the gates, a storm of new languages hit the field.  Took 10 yrs before Haskell formed out of the chaos of new languages.

Lifetime of most programming languages:

  • Invented by 1 person, used by 1 person, dies within a year.
  • Slow death version: invited by 1 person, used by 100, dies in 5 yrs
  • Successful: used by 100K+ programmers, never dies, e.g. Fortran still alive
  • Haskell?  hovering around 1000 programmers, but sharp recent uptick?

Haskell might cross over into immortal language (100K+ programmers)?  Hackage: user uploaded libs, reaching 300 new libs uploaded /month, and approaching 1million downloads.

Type Classes-
Shows examples of lots of things you want actions on a wide variety of types:

  • equality (for set membership?)  how to do equality of functions?
  • ordering (sorting lists)
  • print/showing
  • serialization
  • computing hash function

(so far sounds like Java interfaces)
For all ‘a’ such that ‘a’ is an instance of class ‘Num’
square Num a ==> return a*a

Num classes have the following methods: +,-,*,/, etc.
The implementation of Num classes binds ‘+’ to eq ‘plusInt’ or ‘plusFloat’

Implementation is perhaps more clever than Java invokeinterface bytecode lookup: is passed in a vector of fcns, of the correct type of interface ‘Num’.  The fcn lookup happens on this particular ‘vtable’ – call it an itable, but it’s really a unique v-table per interface.

For Haskell, this is all done with syntatic sugar over the basic Haskell.  Which itself is very cool (e.g. Haskell compiler (javac equivalent) is doing interface calls w/syntatic sugar).

CLIFF NOTES: This is a faster way to do Java interface calls!!!  Or maybe this is how they are impl’d?  Find the i-table from a list of implemented interfaces, then pull out the correct fcn from the i-index.  In his world the i-table is pulled out early and kept pulled out, which avoids the point-by-point lookup of interfaces.  Ahhh…. he’s statically computing the i-table ahead of time during compilation, using some combo of type unification and the closed-universe assumption.  Not applicable to Java.

Able to handle fcns which themselves take arguments of fcns with any signature, as long as the fcn handles the correct interface.

Doing type-based dispatch instead of value-based dispatch.  Not the same as Java interfaces…
In Haskell, can bind a value arg to multiple interfaces all at once.
In Haskell, existing classes can be made instances of new interfaces (duck typing)

Haskell has NO subtyping, but uses the interface (well: type class) notion instead.

Haskell: types are inferenced typically, type annotations occasionally needed
Java: types are declared explicitly; inference occasionally possible

Haskell does type unification, so things like Java’s ‘equal’ call – which is normally a fcn of (Object,Object)=>Bool – in Haskell, we KNOW ahead of time that the 2 args are of the exact same type.  Hence the implementations of various equals calls do not need to ask the “instanceof” question that all Java equals call implementations all start with.


Making Sense of Large Heaps
(Nick Mitchell, IBM)

Where does the memory go in large java apps.

Example from 2008, Cobol to Java.  In Cobol: 600 bytes; in Java 4492 bytes.
Blow up from delegation, bulky collections, or too many data fields.

Size examples: 58M objects, 3Gig ram, 8000+ types.

Yeti Tool – does smart cluster of datatypes (String w/char[], or HashMap w/HashMapEntry[], HashMapEntry, HashMapEntrySet).

Yeti also does dominance of sharing; notices single-owner root/tree bits, and then breaks out the shared parts at the leaves.  Also detect e.g., similar kinds of sharing, such as array B[] points to a set of B’s; but also a linked list of Entry’s points to the elements of B[] in turn.  Really lumping together things with the same owner; maximal dominance w/same ownership relationship.  Does nice heap shape summaries, followed by size info.

Picture is really alternate sandwich effect, alternating between the Collections you selected, and the user-data at that layer of the ‘sandwich’.  Edges hold the various sizes of things.  They fold up collection ‘backbones’.

If we get the data structures right, can easily shrink heaps by 2x to 10x!!!  Huge speedups possible.  Shows examples of people using CHM to hold 1 element (but programmer doesn’t want to think about expected size usage, so does the easy thing and grabs CHM).


Scaling CFL-Reachability-Based Points-To Analysis Using Context Sensitive Must-Not-Alias Analysis

Basically computes a must-not-alias approximation; where it holds do not bother to run a more expensive pts-too analysis.

Some experiments are medium programs (SpecJVM – 7 progs, Dacapo – 4 progs, 8 other progs).  Can compute pts-to graph in 2x to 5x smaller.  Compute time for the pts-to analysis is 3x faster on average than prior work (but still in the several minutes range)… but what do you do with the info?  (this is my standard question when presented with better/faster/new&improved points-to work: now that you have the info how much does it really help speed up programs?  especially compared to having strong typing in the first place)

Nice presentation of a pts-to analysis, but hit right at my sleepy jet-lag afternoon siesta; had trouble staying awake.


Intel NePalTM – Design and Implementation of Nested Parallelism for Transactional Memory Systems.  (Intel)

Issue is using locks instead a XTN; the locks are for the nested parallelism, but want the whole operation to be atomic.  Azul makes no attempt to use the HTM to span across threads.  Can’t work for us, because the threads need to communicate through shared memory – and that shared memory will blow out the HTM.

This is Intel’s compiler-gen’d STM system for C++.  Handles either optimistic (read-friendly) or pessimistic (write-friendly) concurrency; roll-back & retry.

Sigh – shows basically no speedup over plain locking.

(and perhaps not the best name….)


Debugging Method Names

What’s a naming bug?  A mismatch between the name and what the code does.

Break names apart (CamelCase in Java), do some grammer abstraction.
getLastName ==> get – last – name  ==> (verb) (adj) (noun), etc

Then also do semantic analysis of code.  Look for things like: has a loop, has a incoming parm, returns a value (from multiple points), includes some run-time checks in the loop, etc.  Decide it’s a search loop, returning a result or NULL.

So come up with lots of data points for names & methods.
Then mine the rules from a large corpus of code.

E.g. a “contains-XXX” name method should NOT be a void return, because a “contains-XXX” name implies a question with an answer.  In fact, in probably ought to return a bool.

Then with rules, apply rules to a program and report violations.
Turns out pretty easy to suggestion candidate replacement names.

Example:
public void isCaching(bool b) { this.caching=b; }
Method isn’t asking a question, it’s setting value.
Tool suggestions name should be “setCaching”.

Example:
public bool equals( Object o) {
if( this==o ) return true;
if( o instanceof Value )
return equals((Value)o);
return equals(new Value(o.toString()));
}
So not really an ‘equals’ method, because it makes a new ‘Value’ object

More examples, ends up finding interesting bugs as well.  Hard part is to decide whether or not to change the name or change the code.  Sometimes the name is correct, but the code is ‘wrong’ here – and also wrong elsewhere.  Basically finding a poor factoring of code.


Mapping & Recommending API usage Patterns

API, libraries – often complex, lots of classes & methods; complex; hard to use.

People use: books & docs, forums & newgroups; code search engines.

Example: we wish to add an item to Eclipse menu.  Browse docs; find potential call: “appendToGroup”.  Google search finds 151 code snippets (at time of paper submission) and 287 code snippets (at time of talk).  The returned code snippets at least 2 different API usages.

‘MAPO’ Tool does search; parses code; clusters snippets based on used; mines patterns; recommends final patterns.  Tool needs to inline non-3rd-party methods (scattered implementation).

Examples hard to follow…  I’ve been bit here before repeatedly (complex API & no way to learn how to use it easily).  So very sad that it’s so hard to follow his talk.


Supporting Framework Use via Automatically Extract Concept Implementation Templates

Another talk about finding & figuring out reuse of existing code.

Choice between Templates vs Documentation had much less impact on dev time than the task’s concept complexity.


I skipped all the refactoring talks.  Exhaustion is setting in, plus I’ve seen one or two of these talks before.

Went to a couple of talks attempting to deal better with constructors.  Mostly it’s issues like publishing objects before construction completes, or in Java call an overridden v-call from a constructor (which means the sub-class v-call executes before the sub-call constructor can work on the object).  Final fields, read-only objects still being constructed; not-null field properties before you set the value; et al; there’s a bunch of problems that stem from constructors not being instantly-quick – and if they take time, what does the object look like in that in-between state.

This is a general language issue with constructors.  They are a nice notion, but unless they are instantly quick, you have to live with having objects which are not ‘complete’ or ‘constructed’ yet and hence do not meet the expected invariants for the object.


Inlining Security Policies

Want to inline code into the original program; the new program either runs or is truncated (if a security violation happens).  Must show no new behaviors from the inlining, also all old behaviors are allowed (except the secuirty ones), etc.  Single-threaded versions of these inlinings all exist and are well documented.

Multi-threaded is harder: general case isn’t solvable (probably because the monitors themselves become non-deterministic).  But if the monitor is race-free, (and some other easy properties), then you can do it.  Monitor is normally a state-machine.


Cliff Click Talks About Azul

My keynote went well enough; lots of Q&A from the senior people in the audience afterwards.  We had to stop the Q&A after running quite a bit over.  I also got a lot of positive comments afterwards, so I think people really enjoyed the talk.  I’ll add that I enjoyed both Simon Jones’ and Dave Ungars’ keynotes also.  This ECOOP was unusual in having 3 “gentleman hackers” give keynotes; we’re all three both very scholarly and very prolific programmers.  Doug Lea added in a later email:

“Cliff: definitely the best talk I’ve seen you give (which is quite a few). Someday  you’ll have to figure out how you packed in so much technical content yet still fully captured attention of people supposedly without the background to understand most of it.”

High praise from Doug Lea, so I musta done something right.   🙂

Trip home is uneventful, except for Paris’s Charles DeGaul airport being the usual zoo.  I get up at 4:30am and am in a Genova taxi by 5am for the 1/2hr drive to the airport.  Except that there’s no traffic at this hour and the driver thinks he’s Mario Andretti.  We make the trip in about 10 minutes.

So I’m way early into Genova’s airport for a 7:10am flight.  No power connections for my laptop, for-pay wi-fi (not worth it), no shops are open – not even coffee.  No air-partnership; I cannot check in for my USAirways flight from the Air France desk in Genova.  I blog for the next hour and a half.  The flight leaves on-time and the two hours flying are very smooth.  Kudos to Air France.  Then we land at Charles DeGaul and the madness begins.  We have to walk down the plane stairs and across the tarmac and about 1/4 mile more to the main terminal – but actually it’s the small-plane terminal.  After figuring out I’m not in Terminal’s 1, 2 OR 3, I get instructions to take bus Number 3 to the automatic train and take it to Terminal 1.  Of course the buses & bus stop are labeled- with LOTS and LOTS of numbers and plenty of French – takes a question of the drivers to figure out which is the #3 bus.

The bus ride is longish (for an airport bus) and uneventful and then I find the train.  Thankfully the train is clearly marked and again a longer ride to Terminal 1 than I expect: Charles DeGaul is a really spread out airport.  Terminal 1 is a total zoo; I see lines of people snaking out all over filling the large hall, hundreds of people long.  Three times I ask people about which line they are in; none of the lines are marked and all snake back and forth through out the hall; there’s a steady stream of people traffic cutting through all the lines and the lines branch and rejoin helter-skelter.  Finally I find the ‘entrance’ to the line for flight #771 to Charlotte – just next to the line for Philly which clearly orbits the entire hall.  The Charlotte line is actually fairly short, maybe 20 people long, and after 10 minutes I’m talking to the agent.  No problem checking in and then I head off for gate 55 (cutting through the Philly line again).  Then it’s customs (another 10 minute line), and then a small shopping zone – I figure I’ll come back to it for a snack (no breakfast yes; nothing was open in Genova).  But then I discover there’s yet another line for security.  This one is about 20mins long and no way I’m going back for my snack and then back through security.  I figure it’s like US airports – once past security there will be food and such… WRONG!  It’s just the end of the lines; I’m in a seating area with about 10 gates and absolutely no way to get a drink or snack – or bathroom near as I can tell. It’s now 10:10pm and it’s taken me a solid hour to navigate Charles DeGaul to find my gate.  As I type, I’m standing by a laptop charging station (no chairs nearby), in the hopes I can do a little something on the laptop during the next 8 hrs of flight.

“delegation based” machine model; dynamic separation of concerns “MDSOC”

Machine model: objects, msgs, delegation.  High lvl objects represented as at least 2 low-level objects; a “proxy” and a real object.  Indirection to allow message interception.  All msg-sends get forwarded thru the proxy (“message” here is a java-lvl function call, not an OpenMPI msg – but could be either in a distributed system).

Aspect-Oriented-Programming – intercept msg-send links (between Class-proxy and methods in the actual Class-body), etc.  How to do CHA with basically dynamic sub-class insertion?  Just re-JIT in the new class hierarchy?

Ok – now new stuff… want concurrent in the delegation function.  Actor is a collection of local objects (and ptrs to remote objects).  Fcn calls between local objects are fast; fcn calls across actors are basically remote-msg send. Then receiving a msg invokes a co-routine in the remote actor.

Ugh, using ‘yield’ with the co-routines.  Sounds buggy to me: semantics of dead-lock will depend on proper calling ‘yield’.

(assuming a VM w/ actors not thread support)


An Efficient Lock-Aware Transactional Memory Implementation
Justin Gottschlich

Trying to integrate into the “boost” STM system.

Locks+TM break things.
But locks are prevalent – must be able to compose with them.

Example: Locks outside Transaction
– T1 runs lock; T2 runs transaction
Example: swap code.  Works with only-locks or only-TMs but breaks if mixed

Locks inside Trans
– T1 runs trans, calls lock; T2 runs either trans or lock

Prior work: “full lock protection” – if you take a lock, then the XTNs are all stopped/aborted/etc and you only get locking behavior.

Offers a programmer solution: programmer lists lock/XTN conflicts and the XTN system deals when you take a conflicting lock.  (I believe this is unrealistic).


Eric Jul? – Whiteboard not slides….

Blurring the line between compilers & runtimes

Gives some examples already done by hotspot

Nice discussion, but nothing new.  Mostly trying to get a discussion going about crosing compilers/JITs & runtimes.  He might not have been aware about what the JIT already does.


Trace-Based Type Specialization in JavaScript
Andreas Gal

Same basic talk as given at PLDI.  TraceMonkey? FireFox 3.5
JavaScript & Flash instead of Java/C/C++

JS has been very slow (interpreted only).  But is here to stay; very popular & growing.  ActiveX & client-side Java dying out (not sure about client-side Java which was never popular in the 1st place).

Static typing makes life easy, but dynamic typing is required.  More complex data types and more runtime tests.  Tag bits to be checked; overflow to Double, etc.

Coming around to wanting a HotSpot like dynamic JIT’ing thing based on types that happen to be true at the moment.  Basically, types in program traces remain stable over time.

  – for( var i=0; i<100; ++i ) { /* nothing */}

loop:
if ( int(i) >= 100 ) break;
i = box(int(i)+1);
goto loop;

So trace can record, e.g. that ‘i’ has always been an ‘int’ so far.  Trace has a guard on the input types, and are type-specific.  Function calls are inlined in the traces, along with guard statements to check that you are taking the same control-flow.

Tracing loops, and can verify trace is type-stable across loop.  So can remove e.g. boxing in the loop.

But real traces have many exits and the many exits are really taken.  So trace along each exit.  Build trace-trees, rooted at loop headers (exponential growth of trace trees as the loop DAG is split out?).  Can only link back into original tree if types all match up – else need a new loop /tree header.

Suffering for lack of a language spec; JS programs are driving the language semantics (i.e., w/a new interpreter some JS programs fail so we call the interpreter buggy- even if it meets the loose “spec”).  No nothing of a memory model or threading; lots of other holes in the language.  Echo’s of what Java went through: the popular implementations defined the spec.


Tracing the Meta-Level: PyPy’s Tracing JIT Compiler
Carl Bolz

PyPy is a tracing JIT compiler.  Now apply this tracing JIT to an interpreter.
Compiler “Restricted Python” RPython – can target C, Java, .Net
Various interpreters: python, prolog, smalltalk, scheme, etc

Basic idea:
trace loops; look for type-stable loop execution; look for similar code path loops.

But the dispatch loop in interpreters means you never execute the same loop code twice (because each time you are running a new different bytecode).  Goal: trace user-mode program, not the language interpreter.  Effective the tracing interpreter unrolls the bytecode dispatch loop.  Provde 3 hints to laguange interpreter.  – hint for position key; here is the language interpreter’s BCI; here are backwards edges; here is the PC modification

Works fairly well to clean out the language interpreter from the traces.
Then get traces which are fairly clean & can JIT them.

Bears resemblance to partial evaluation, arrived at by different means.  Future work: better optimization of traces; some escape analysis to remove boxing operations.  Optimize frame objects, apply to larger programs.


Faster than C#: efficient implementation of dynamic languages on .NET
Antonio Cuni

Trying to make e.g. Python faster on .Net.
Looking at IronPython, Jython, JRuby, Groovy
(also Self, JavaScript/TraceMonkey/V8)

Why so slow?  Hard to compile efficiently; lack of type info at runtime; VMs not optimized to run them.  .Net is a multi-language VM, right (sure, as long as the language is C# – his quote, not mine!).  JVM is in better shape, but still heavily optimized for Java.

JIT compiler?  Wait until you know what you need; interleave compile-time and runtime; exploit runtime info.  JIT layering; fight the VM…

PyPy; JIT compiler generator; Python semantics “for free”.  JIT frontend not limited to python; JIT backend: x86 or CLI/.NET backends.  Fun games with partial eval: assume e.g. Python bytecodes to be constant & constant-prop them into the python interpreter.

Do even more fun with constants than HotSpot+OnStackReplacement – totally doing speculative constant value JIT’ing – if this argument is of value ‘3’ then here is the JIT’d code.  Trick is to pick which variable to constant- speculate on (and getting that spec value as well).

Not yet doing all of Python, but getting really great speedups.


Strata Virtual Machine

Software Dynamic Translation – read a binary, & translate it/jit the translated code.

Using a code-based hash-table lookup.  Hash; jump to hash-table entry.  Miss: jump to ‘strata’ interpreter
Hit: jump to bucket; bucket checks target (like HS inline cache, check at target)

HotSpot could use this to make more efficient v-calls?  Hash; indirect jump to nearby code table; table jumps to target & checks target; expect no misses after warmup.  Nah… still got indirect branch in there.

Looks like a standard binary dynamic translation type stuff (converting PPC instructions to Java bytecodes?)


Automatic Vectorization in JIKES RVM

Using SIMD ops on X86.  Can’t use BURS/BURG to pattern-match vector ops.  Unroll loops to make the patterns more obvious.

Basically getting some SIMD stuff to work (but talk given by PC not by author, and PC believes this is not the right way to discover SIMD ops).  I also believe this… it’s not a clean fit to BURS.

Code emit via simple bitmask/shift/and/or.


JIT Compilation on ARM Processors
Michele Tartara

ARM – 31 32-bit GPRs, 16 available at a time?.  SP, LNK, PC are GPRs.  Fixed format 32-bit ops.  Dynamic fast compilation (cell-phone targets).  No BURS or tiling, just greedy rules.  This is probably the right way to go always.


ECOOP 2009 Proper Starts

ECOOP is being held in the Piazzo Ducall (Ducal Palace) – the main conference presentation chamber has perhaps 40ft ceilings, acres of gold leaf on the walls in between the massive medieval paintings.  The speaker dias was clearly meant to hold a throne or the orchestra; it’s a circular marble dias perhaps 20ft in diameter with marble balustrade and marble railings.  The high tech screen & projector setup in the middle is really anachronistic.

The adjacent formal ballroom is much larger; 50ft+ ceilings; chandeliers of at least 20ft tall and 30ft in diameter; paintings & gold leaf in plenty plus also perhaps a dozen 8ft marble statues with ancient greek themes.


Keynote – Classes, Jim, but Not as we Know Them
Simon P Jones

As usual for Simon, he gave a wonderful presentation.

History – Haskell is 20yrs old.  Lots of fun with new languages & machines & ideas (Functional Programming).  After Backus’s Turing Award lecture opened the gates, a storm of new languages hit the field.  Took 10 yrs before Haskell formed out of the chaos of new languages.

Lifetime of most programming languages:

  • Invented by 1 person, used by 1 person, dies within a year.
  • Slow death version: invited by 1 person, used by 100, dies in 5 yrs
  • Successful: used by 100K+ programmers, never dies, e.g. Fortran still alive
  • Haskell?  hovering around 1000 programmers, but sharp recent uptick?

Haskell might cross over into immortal language (100K+ programmers)?  Hackage: user uploaded libs, reaching 300 new libs uploaded /month, and approaching 1million downloads.

Type Classes-
Shows examples of lots of things you want actions on a wide variety of types:

  • equality (for set membership?)  how to do equality of functions?
  • ordering (sorting lists)
  • print/showing
  • serialization
  • computing hash function

(so far sounds like Java interfaces)
For all ‘a’ such that ‘a’ is an instance of class ‘Num’
square Num a ==> return a*a

Num classes have the following methods: +,-,*,/, etc.
The implementation of Num classes binds ‘+’ to eq ‘plusInt’ or ‘plusFloat’

Implementation is perhaps more clever than Java invokeinterface bytecode lookup: is passed in a vector of fcns, of the correct type of interface ‘Num’.  The fcn lookup happens on this particular ‘vtable’ – call it an itable, but it’s really a unique v-table per interface.

For Haskell, this is all done with syntatic sugar over the basic Haskell.  Which itself is very cool (e.g. Haskell compiler (javac equivalent) is doing interface calls w/syntatic sugar).

CLIFF NOTES: This is a faster way to do Java interface calls!!!  Or maybe this is how they are impl’d?  Find the i-table from a list of implemented interfaces, then pull out the correct fcn from the i-index.  In his world the i-table is pulled out early and kept pulled out, which avoids the point-by-point lookup of interfaces.  Ahhh…. he’s statically computing the i-table ahead of time during compilation, using some combo of type unification and the closed-universe assumption.  Not applicable to Java.

Able to handle fcns which themselves take arguments of fcns with any signature, as long as the fcn handles the correct interface.

Doing type-based dispatch instead of value-based dispatch.  Not the same as Java interfaces…
In Haskell, can bind a value arg to multiple interfaces all at once.
In Haskell, existing classes can be made instances of new interfaces (duck typing)

Haskell has NO subtyping, but uses the interface (well: type class) notion instead.

Haskell: types are inferenced typically, type annotations occasionally needed
Java: types are declared explicitly; inference occasionally possible

Haskell does type unification, so things like Java’s ‘equal’ call – which is normally a fcn of (Object,Object)=>Bool – in Haskell, we KNOW ahead of time that the 2 args are of the exact same type.  Hence the implementations of various equals calls do not need to ask the “instanceof” question that all Java equals call implementations all start with.


Making Sense of Large Heaps
(Nick Mitchell, IBM)

Where does the memory go in large java apps.

Example from 2008, Cobol to Java.  In Cobol: 600 bytes; in Java 4492 bytes.
Blow up from delegation, bulky collections, or too many data fields.

Size examples: 58M objects, 3Gig ram, 8000+ types.

Yeti Tool – does smart cluster of datatypes (String w/char[], or HashMap w/HashMapEntry[], HashMapEntry, HashMapEntrySet).

Yeti also does dominance of sharing; notices single-owner root/tree bits, and then breaks out the shared parts at the leaves.  Also detect e.g., similar kinds of sharing, such as array B[] points to a set of B’s; but also a linked list of Entry’s points to the elements of B[] in turn.  Really lumping together things with the same owner; maximal dominance w/same ownership relationship.  Does nice heap shape summaries, followed by size info.

Picture is really alternate sandwich effect, alternating between the Collections you selected, and the user-data at that layer of the ‘sandwich’.  Edges hold the various sizes of things.  They fold up collection ‘backbones’.

If we get the data structures right, can easily shrink heaps by 2x to 10x!!!  Huge speedups possible.  Shows examples of people using CHM to hold 1 element (but programmer doesn’t want to think about expected size usage, so does the easy thing and grabs CHM).


Scaling CFL-Reachability-Based Points-To Analysis Using Context Sensitive Must-Not-Alias Analysis

Basically computes a must-not-alias approximation; where it holds do not bother to run a more expensive pts-too analysis.

Some experiments are medium programs (SpecJVM – 7 progs, Dacapo – 4 progs, 8 other progs).  Can compute pts-to graph in 2x to 5x smaller.  Compute time for the pts-to analysis is 3x faster on average than prior work (but still in the several minutes range)… but what do you do with the info?  (this is my standard question when presented with better/faster/new&improved points-to work: now that you have the info how much does it really help speed up programs?  especially compared to having strong typing in the first place)

Nice presentation of a pts-to analysis, but hit right at my sleepy jet-lag afternoon siesta; had trouble staying awake.


Intel NePalTM – Design and Implementation of Nested Parallelism for Transactional Memory Systems.  (Intel)

Issue is using locks instead a XTN; the locks are for the nested parallelism, but want the whole operation to be atomic.  Azul makes no attempt to use the HTM to span across threads.  Can’t work for us, because the threads need to communicate through shared memory – and that shared memory will blow out the HTM.

This is Intel’s compiler-gen’d STM system for C++.  Handles either optimistic (read-friendly) or pessimistic (write-friendly) concurrency; roll-back & retry.

Sigh – shows basically no speedup over plain locking.

(and perhaps not the best name….)


Debugging Method Names

What’s a naming bug?  A mismatch between the name and what the code does.

Break names apart (CamelCase in Java), do some grammer abstraction.
getLastName ==> get – last – name  ==> (verb) (adj) (noun), etc

Then also do semantic analysis of code.  Look for things like: has a loop, has a incoming parm, returns a value (from multiple points), includes some run-time checks in the loop, etc.  Decide it’s a search loop, returning a result or NULL.

So come up with lots of data points for names & methods.
Then mine the rules from a large corpus of code.

E.g. a “contains-XXX” name method should NOT be a void return, because a “contains-XXX” name implies a question with an answer.  In fact, in probably ought to return a bool.

Then with rules, apply rules to a program and report violations.
Turns out pretty easy to suggestion candidate replacement names.

Example:
public void isCaching(bool b) { this.caching=b; }
Method isn’t asking a question, it’s setting value.
Tool suggestions name should be “setCaching”.

Example:
public bool equals( Object o) {
if( this==o ) return true;
if( o instanceof Value )
return equals((Value)o);
return equals(new Value(o.toString()));
}
So not really an ‘equals’ method, because it makes a new ‘Value’ object

More examples, ends up finding interesting bugs as well.  Hard part is to decide whether or not to change the name or change the code.  Sometimes the name is correct, but the code is ‘wrong’ here – and also wrong elsewhere.  Basically finding a poor factoring of code.


Mapping & Recommending API usage Patterns

API, libraries – often complex, lots of classes & methods; complex; hard to use.

People use: books & docs, forums & newgroups; code search engines.

Example: we wish to add an item to Eclipse menu.  Browse docs; find potential call: “appendToGroup”.  Google search finds 151 code snippets (at time of paper submission) and 287 code snippets (at time of talk).  The returned code snippets at least 2 different API usages.

‘MAPO’ Tool does search; parses code; clusters snippets based on used; mines patterns; recommends final patterns.  Tool needs to inline non-3rd-party methods (scattered implementation).

Examples hard to follow…  I’ve been bit here before repeatedly (complex API & no way to learn how to use it easily).  So very sad that it’s so hard to follow his talk.


Supporting Framework Use via Automatically Extract Concept Implementation Templates

Another talk about finding & figuring out reuse of existing code.

Choice between Templates vs Documentation had much less impact on dev time than the task’s concept complexity.


I skipped all the refactoring talks.  Exhaustion is setting in, plus I’ve seen one or two of these talks before.

Went to a couple of talks attempting to deal better with constructors.  Mostly it’s issues like publishing objects before construction completes, or in Java call an overridden v-call from a constructor (which means the sub-class v-call executes before the sub-call constructor can work on the object).  Final fields, read-only objects still being constructed; not-null field properties before you set the value; et al; there’s a bunch of problems that stem from constructors not being instantly-quick – and if they take time, what does the object look like in that in-between state.

This is a general language issue with constructors.  They are a nice notion, but unless they are instantly quick, you have to live with having objects which are not ‘complete’ or ‘constructed’ yet and hence do not meet the expected invariants for the object.


Inlining Security Policies

Want to inline code into the original program; the new program either runs or is truncated (if a security violation happens).  Must show no new behaviors from the inlining, also all old behaviors are allowed (except the secuirty ones), etc.  Single-threaded versions of these inlinings all exist and are well documented.

Multi-threaded is harder: general case isn’t solvable (probably because the monitors themselves become non-deterministic).  But if the monitor is race-free, (and some other easy properties), then you can do it.  Monitor is normally a state-machine.


Cliff Click Talks About Azul

My keynote went well enough; lots of Q&A from the senior people in the audience afterwards.  We had to stop the Q&A after running quite a bit over.  I also got a lot of positive comments afterwards, so I think people really enjoyed the talk.  I’ll add that I enjoyed both Simon Jones’ and Dave Ungars’ keynotes also.  This ECOOP was unusual in having 3 “gentleman hackers” give keynotes; we’re all three both very scholarly and very prolific programmers.  Doug Lea added in a later email:

“Cliff: definitely the best talk I’ve seen you give (which is quite a few). Someday  you’ll have to figure out how you packed in so much technical content yet still fully captured attention of people supposedly without the background to understand most of it.”

High praise from Doug Lea, so I musta done something right.   🙂

Trip home is uneventful, except for Paris’s Charles DeGaul airport being the usual zoo.  I get up at 4:30am and am in a Genova taxi by 5am for the 1/2hr drive to the airport.  Except that there’s no traffic at this hour and the driver thinks he’s Mario Andretti.  We make the trip in about 10 minutes.

So I’m way early into Genova’s airport for a 7:10am flight.  No power connections for my laptop, for-pay wi-fi (not worth it), no shops are open – not even coffee.  No air-partnership; I cannot check in for my USAirways flight from the Air France desk in Genova.  I blog for the next hour and a half.  The flight leaves on-time and the two hours flying are very smooth.  Kudos to Air France.  Then we land at Charles DeGaul and the madness begins.  We have to walk down the plane stairs and across the tarmac and about 1/4 mile more to the main terminal – but actually it’s the small-plane terminal.  After figuring out I’m not in Terminal’s 1, 2 OR 3, I get instructions to take bus Number 3 to the automatic train and take it to Terminal 1.  Of course the buses & bus stop are labeled- with LOTS and LOTS of numbers and plenty of French – takes a question of the drivers to figure out which is the #3 bus.

The bus ride is longish (for an airport bus) and uneventful and then I find the train.  Thankfully the train is clearly marked and again a longer ride to Terminal 1 than I expect: Charles DeGaul is a really spread out airport.  Terminal 1 is a total zoo; I see lines of people snaking out all over filling the large hall, hundreds of people long.  Three times I ask people about which line they are in; none of the lines are marked and all snake back and forth through out the hall; there’s a steady stream of people traffic cutting through all the lines and the lines branch and rejoin helter-skelter.  Finally I find the ‘entrance’ to the line for flight #771 to Charlotte – just next to the line for Philly which clearly orbits the entire hall.  The Charlotte line is actually fairly short, maybe 20 people long, and after 10 minutes I’m talking to the agent.  No problem checking in and then I head off for gate 55 (cutting through the Philly line again).  Then it’s customs (another 10 minute line), and then a small shopping zone – I figure I’ll come back to it for a snack (no breakfast yes; nothing was open in Genova).  But then I discover there’s yet another line for security.  This one is about 20mins long and no way I’m going back for my snack and then back through security.  I figure it’s like US airports – once past security there will be food and such… WRONG!  It’s just the end of the lines; I’m in a seating area with about 10 gates and absolutely no way to get a drink or snack – or bathroom near as I can tell. It’s now 10:10pm and it’s taken me a solid hour to navigate Charles DeGaul to find my gate.  As I type, I’m standing by a laptop charging station (no chairs nearby), in the hopes I can do a little something on the laptop during the next 8 hrs of flight.

Machine model: objects, msgs, delegation.  High lvl objects represented as at least 2 low-level objects; a “proxy” and a real object.  Indirection to allow message interception.  All msg-sends get forwarded thru the proxy (“message” here is a java-lvl function call, not an OpenMPI msg – but could be either in a distributed system).

Later: flight is fairly smooth. They serve some meals; I have another flight through Charlotte, NC to San Francisco.  Home at last!  For about 24 hrs that is, then I’m on vacation in Texas with my entire family.  More plane flights  for me!  Yumm, I can already taste the small salty snacks they are going to serve…