Hardware Visibility vs Finite State Machine

I’ve been having an interesting conversation about the NonBlocking or Lock-Free Hash Table, running down the tail end of this post; hopefully Kevin won’t mind if I repost some of his comments here.  Typepad does not allow comments to have HTML and our replies & counter-replies are getting hard to read!  So Kevin writes:

I would note there’s no way for you to “read the current K/V state”.   You have to agree with that. There’s no such thing or way.

i.e. if you do a load 1000 times, you might still get 1000 old values of K or V.

If you agree that the “actual” K/V state is just a concept that you can’t know then there’s no reason to believe it exists or even discuss it.  i.e. it provides no value if you can’t know it.

There are 2 times where you DO get the “current K/V state” – (1) the clock cycle upon reading a K-that-is-an-X, a K-that-is-a-NULL, or a V (given you already read a not-null not-X K) – and CAS counts as a read. And (2) after a “long time” past the last update to that slot.

You can complete a CAS for a V and still have an old view of a K, for instance.

Yup; the FSM guarantees that at the time you attempted the CAS the “old value for K” is also the current value of K and also the future value of K for all time.  You don’t CAS on a V until K has hit it’s final value.

So the only thing that’s worth discussing is what views each thread has, and what the total legal system state is for all threads.

The proof requires a full enumeration of the legal states for all other threads, along with the description of a single threads FSM, and that no two threads have bad cooperation that causes an erroneous FSM transition.

Yup again; fortunately the FSMs for each K/V slot are independent so the proof of this is nothing more than the trivially replicated proof of each separate K/V FSM.

Now I need to show that many threads running the same FSM are legal.  Again, this is trivial to reduce – by definition of an FSM a FSM does not care which thread makes transitions. So the proof devolves down to “is the FSM correct”.

This point bears repeating: I don’t care which thread makes any transition.  I rely on the hardware’s correct implementation of atomic CAS in order to make transitions.  I don’t care who makes the transitions or in what order they happen – as long as they follow the FSM’s “rules”.

Also, the hw doesn’t provide the abstraction that there is a single K/V state. It provides the behaviors the memory model describes, which is not a set of rules around single values …it’s a set of visibility rules, which inherently allow multiple values for a single address to  exist in a system for a long time.

Yes again – and these views are not contradictory.  i.e., all these statements are true all at the same time:

  • “hardware provides … a set of visibility rules”
  • “hardware allows multiple values for a single address to exist for a long time”
  • “hardware provides the abstraction of a single K/V state”

For the last 2 statements I’ll note that the abstraction of a single K/V state exists it’s just hard to witness.  But not impossible. I witness it via a successful CAS (FSM transition) or if I wait “long enough”.  (I’ll add that I’ve seen “long enough” run into the several minute range on real hardware!!!).

Perhaps it would help if you would provide a counter-example?  Some set of thread schedules & successful CAS’s that move outside the FSM?  You’re allowed to assume any amount of delay on K/V visibility but a successful CAS has to remain atomic.  Such a counter-example would of course blow my claim of correctness so I don’t think you can come up with such a thing.  But the attempt will be constructive for us both.  It’s always good for clarity when somebody else plays “The Counter-Example” game.