in Personal

Adding Transactions to the Non-BlockingHashMap

I went to the winter meeting of DaCapo, a small group of senior academics and industrial scientists mostly focussed on GC & Memory-related issues (but with fairly wide interests).  Much of the work is not yet published; work-in-progress stuff (lack of rigorous experiments, or badly broken implementations, or Big-Ideas-No-Code state, etc).  So I won’t repeat what I saw here directly, contact the authors directly if you’re interested in something.  The DaCapo group has decided to come out with a Java Server benchmark, similar to their popular DaCapo Benchmark Suite.  This will be something that’s much much easier to run than SpecJAppServer (that won’t be hard! SJAS is a total nightmare to setup) and much more realistic than SpecJBB2005.

Of the DaCapo material that’s already published, I like Tracking Bad Apples the best.  I’m going to float this one around Azul and see if we want implement it.  I also stumbled over this gem while perusing Emery Burger’s page.

Adding Transactions to the Non-BlockingHashMap

One of the other things that came out was the result of a quick conversation between Rick Hudson, Eliot Moss and myself: adding software transactions to the NonBlockingHashMap.  A transaction would mean that a single thread could do a series of reads & modifications to the HashMap and either abort them all, or commit them all atomically.  If committed, no other thread would see any partial updates and this thread is assured that no other thread modified any K/V pair read during the transaction.  Transactional support would go a long ways towards making a (fast, scalable, non-blocking) in-memory database (A C I but no D).  One of the reasons this idea looks so tractable, is that all the accesses to the transacted memory are already wrapped in function calls – there’s no requirement for a “below the JVM” or “inside the C compiler” implementation.  It can all be done in pure Java.

The basic idea here is to add another State on the Value slot, representing a “transaction in progress”.  Non-transacting threads seeing this state will be required to do more work: at least 1 more layer of indirection to get the actual value, and at least 1 flag check to decide if they need to report the before- or after- transaction-committed value.  The State would be implemented as a variant of box’ing like I do now: an instanceof test reports that the Value is really a Box, and the reader of the Box now reads through the Box to get the actual value.

I believe I can implement this in the State Machine that drives the NBHM, or at least I have a back-of-the-envelope FSM that handles 2 conflicting transactions.  Transactions would be non-blocking  (so no live-lock – always some thread gets to complete a transaction) – but not fair (I couldn’t find a Wikipedia article on the notion of fairness, although there are plenty of articles with the word fairness in their title).  It’s possible for a thread with long transaction to be continuously aborted by other threads with doing repeated short conflicting transactions.  In DB’s, I suspect fairness is a big deal.  In non-blocking algorithms it’s tough to implement: the endless winners need to stop winning at some point and help the endless loser to win every now and then.  The problem is, the winners are different threads than the loser and can’t actually execute the loser’s code directly (other than the little bits inside the NBHM calls).  If I block the winners to let the loser get through, then the algorithm becomes … well…, blocking.  I can ‘pause’ the winners for a fixed period of time and retain the non-blocking title, although this seems messy.  Does somebody have a better idea here?

Another, API-related question: should the transactional calls be under a completely separate API, or should I just have “begin_transaction”, “commit”, and “abort” calls, and assume all the calls in the middle are being transacted over?  What happens if a thread starts a transaction, then does a non-transactional conflicting write (abort the transaction?)?  Do I allow more than one thread to work on the same transaction?  i.e., make a “transaction object” and allow it to be passed around, and require it in transactional calls?

Thanks,
Cliff