Too Much Theory, Part 3

This is the 3rd and last part of this blog!  (thank heavens!), where I wax poetic about Lattices and Type Theory, and their applications to Compilers and in particular Java JITs. Part 1: Too Much Theory Part 2:  Too Much Theory, Part 2 ## A Quick Recap We’re building a lattice, to allow us to do exciting optimizations on Java programs.  Our lattice has distinguished top and bottom types, and is Commutative, Associative, and Symmetric.  We understand pointers can be null, not_null, or unknown (e.g. bottom).  Because of symmetry across the lattice centerline, we also have the dual of not_null which we’ll call any_null.  For this recap, we’ll also have Java Classes (e.g. String), but we’re going to ignore subclassing for the moment.  Our lattice looks something like this: Wow, that’s getting fancy…  lets revisit some of the elements in this lattice real quick: * bottom: All possible values, including null, as computed by running the program.  No constants. * String:bot: All possible Strings, including null, as computed by running the program.  No constants, no XMLNodes.  For brevity I will sometimes refer to this as String (no trailing :bot notation). * XMLNode:bot or plain XMLNode: All possible XMLNodes, including null, as computed by running the program.  No constants, no Strings. * bottom:not_null: All possible values, as computed by running the program.  No constants, no null. * String:not_null: All possible Strings, as computed by running the program.  No constants, no XMLNodes, no null. * String:hello_world: The constant String “hello world”, and nothing else. * null: The constant null. * String:any_null: All possible String constants, all at once.  No XMLNodes, nor null.  An impossible situation in a real program, but a useful mathematical notion when running Constant Propagation. * String:top: All possible String constants and the null constant, all at once.  No XMLNodes. * top: All possible constants, including all Strings and all XMLNodes and null, and all them all at once. Notice the symmetry: ever Node below the centerline of constants also appears in dual form above the centerline.  The edges are also symmetric: e.g. top has 4 edges out and bottom has 4 edges in, String:top has 1 in, 2 out, and String:bot has 2 in, 1 out, etc. This lattice will let us find the constants in code like this example from last time:

final class LinkedListThing {
  LinkedListThing _next; // instance field
  int hashCode() {
    int hash = 0;
    Object x = this; // x is LinkListThing:not_null
    while( x != null ) {
      hash += x.hashCode(); // x is LinkListThing:not_null
      if( x instanceof LinkedListThing )
         x = ((LinkedListThing)x)._next; // x is LinkedListThing
      else // with Conditional Constant Propagation...
         x = "fail"; // ...this path is dead
      // x is LinkedListThing:bottom and not a String
    }
    return hash;
  }
}

And Now Gentle Readers, Let Us Ponder Subtypes Java, of course, has subtypes. So does C++ and many other languages. Lets take a look at what it would take to add subtypes to our lattice. This tiny program snippet makes exactly a Hashtable:

  Hashtable x = new Hashtable(); y = x.get(); // Calls Hashtable.get

And this snippet makes a known subclass of Hashtable, but treats it like a Hashtable:

  Hashtable x = new Provider();  y = x.get(); // Calls Provider.get

And this snippet mentions an UNknown subclass of Hashtable:

  void foo( Hashtable x ) {      y = x.get(); // Calls ???

When does ‘Hashtable x’ refer to exactly a Hashtable, and when does it refer to some subclass of Hashtable? Knowing if some value is exactly a Hashtable versus a subclass is very useful: we can optimize calls made to exactly known classes. Example: What function is called when we call “x.get()”? Well, if x is exactly a Hashtable, we get Hashtable.get() … and we can inline “get()”. If x is a Hashtable-or-a-subclass, then “x.get()” might be Provider.get() or Hashtable.get() or some other user-specified derived version of “get()”, and we cannot inline. The job of figuring out if ‘x’ is exactly of class Hashtable, or some subclass of Hashtable falls to Constant Propagation – and that requires we represent this notion of ‘exact class’ in the lattice. Lets add the notion ‘Hashtable:exact’ to mean exactly a Hashtable, and NO subclasses, while plain ‘Hashtable’ allows subclasses. This is an independent axis from allowing null, so we can still have e.g. Hashtable:exact:not_null.  Note that final classes like String cannot subclass and so are always String:exact; constants are always the exact class that they are, e.g. Hashtable:exact:0x1234. I have another “Lattice” for you to ponder, with Hashtables instead of XMLNodes.  I’m still using bottom in places but I might as well use Object:bot or just plain Object.  Also, I ‘twisted’ the picture slightly: lattice elements in the middle all exclude null, and those on the outer edges all include null.  I had to duplicate the null element to make the graph lay flat visually, but really they are the same element.  I could have laid the graph ‘flat’ on a cylinder, but the web’s not up to 3-D visualization yet. . ## And Now The Punch Line And now for the Trick Question: what happens when I end up mixing together (with Phi functions on loop backedges) Hashtable:top with String:exact:not_null?  (A different question is how I come to such a situation that I attempting to mix these types… but trust me I’ve seen this happen in QA from suitably complex programs!)  So back to my Question: I am look for the most precise answer possible, anything less and I’ll be losing type information for no good reason.  So for example bottom is a correct answer, but very conservative – we can do better than that! How about bottom:not_null?  Since for Hashtable:top the compiler can pick any Hashtable it wants – we want it to pick a Hashtable, ANY Hashtable.  The result of mixing that with String:exact:not_null is … some random not-null Object: bottom:not_null.  This might let me remove a null-pointer check at compile time. But we could also take just the null from Hashtable:top, since the compiler is allowed to pick that instead!  Remember: the top label means the compiler “gets to choose” during the course of Constant Propagation – and picking a more precise answer makes it more likely we’ll find a constant and have our choice remain correct.  So mixing null and String:exact:not_null yields String:exact.  This might, e.g., let me convert an unknown x.hashCode() into a known String:hashCode() call. So which one do I pick?  bottom:not_null or String:exact?  Typically I have no idea during the course of CP which one will pan out better – or actually if either will lead to a better final answer.  The Real Trick here is: I should not be required to pick.  The Constant Propagation algorithm relies on having a lattice – and lattices are defined as having a unique lower-bound (and unique upper-bound) for any two elements.  This structure does NOT have a unique lower-bound, and so is no longer a lattice!  For some particular Java program and some particularly bad luck with our CP algorithm – the CP algorithm can hang indefinitely, or miss some constants on some runs – but find those constants on other runs for the same Java program.  Garbage-In Garbage-Out.  CP is technically wedged. In practice HotSpot’s CP does not hang, although I suspect I can carefully arrange a Java program for which it would.  Instead, I end up triggering asserts in QA about how my lattice is lacking the proper symmetry (and hence losing constants for no other good reason, but only in weird programs).  I did construct 2 simple programs for which choosing bottom:not_null versus String:exact would find a constant (and enable an optimization) in one program and not the other… and vice-verse for reversing  choices. Cliff ## Postlude As you might have guessed by finding this blog, I’m off to a new adventure – this time using a JVM instead of building one (well maybe: I’m very pragmatic; may the best language win!).  And while I’ll probably still be tinkering in HotSpot from time to time, I think my future blogs will mostly be about my new adventure! – till next time, Cliff

Published by

wpengine

This is the "wpengine" admin user that our staff uses to gain access to your admin area to provide support and troubleshooting. It can only be accessed by a button in our secure log that auto generates a password and dumps that password after the staff member has logged in. We have taken extreme measures to ensure that our own user is not going to be misused to harm any of our clients sites.