Car Repair

Warning: no technical content…

Yesterday my car picked up a howling – like a cross between a demon-scream and cat-torture.  Loud enough it was drowning out the radio; loud enough that other drivers looked at me funny.  It came from this little water box under the hood with a little icon like a windshield water sprayer.  Now my windshield washer has not worked in 5 years, and I’ve just learned to live without it.  Taking a look at this box, it seems like it’s the empty washer fluid box and the pump has decided to stick-on and it’s drumming away on this plastic box with the
resulting howl like demons dieing.  

So I fill it with water, to see if it’ll stop the howling.  Immediately water starts leaking from under the car – my gallon of water is pouring out at a goodly rate spraying from a dozen leaks.  No doubt 10 years of cheap plastic hose as finally died, and the pump is forcing against plugged washer nozzles, the line has split multiple times and shortly the box is empty and the howling resumes.  Indeed while poking at the lines at a juncture point the hose isn’t holding – and sprays me full hard in the eyes.  But it’s my fresh cold clean water only, no harm.

Now I start looking for how to replace and/or repair this pump.  But its really tangled in under the engine; no way I can see to get at it without major dismantling of hood & fender parts.  So I go for plan B: pull the pump’s power cable.  Can’t reach it; can’t even find it.  Plan C: pull the fuse.  The Google fails me, no electric diagrams or fuse-box layouts for a 12-yr old car.  Ahh… but I still have the owner’s manual – and shortly a fuse-box layout.  I’m staring at it looking for “windshield fluid pump” or something… when I see that funny little icon on the water box…

It’s an Intercooler Water pump????  WTF?? is an intercooler pump?  Well – it turns out for a fancy-schmancy sports car to be all fancy-schmancy it has to have some extra high-tech gadgets that the drivers can manually operate.  In this case, it floods water all over the engine – on purpose – to help with evaporative cooling.  You can imagine racing the car on a hot summers’ day; the turbo is compressing the dickens out of the air, and that makes the air very hot.  Cooling it gives horsepower directly.  Also with the engine constantly being hard pressed, it’s also hot.  Hence the sprays all over the engine.

And sure enough, there’s that same little funny water-spray icon in the middle of my center console, with “auto” and “man” settings.  It’s right in the combat zone for spilled coffee and donuts, and it’s gotten fairly grimey over the years.  Currently it stuck down on the “man” side.  I press towards the middle neutral setting.  The howling stops.  Blessed, blessed silence.

I clean it up and it stays nicely in neutral.  I can click it up or down, and hear the relays clacking… and when the water box is full and the button held down on “man” the pump sprays mightly all over the engine.  And on the “auto” setting it’s just silently there.  I replace all the fuse box covers, clean the button, fill the water box… repair is done.  The drive to work is zippy, the Evo is humming nicely… and no howling.

Cliff

Come visit H2O at Strata Booth 919

Greetings H2O friends and fans!

Let’s do the data dance at Strata Santa Clara, Feb. 11-13 and check out our latest H2O Prediction Engine demo.  We will be exhibiting at booth 919 and offering a 20% discount off registration.  The show is slated to sell out, so be sure to register today and get your 20% discount with our code:  0XDATA20, register here.

Also, H2O CEO and co-founder SriSatish Ambati will give a talk at the Big Data Science Meetup on Monday, Feb. 10 in Ballroom E the night before Strata kicks off!

We hit a growth spurt over the last 9 months and have seen amazing customer traction.  Now at 4000 followers, and 45 meetups later, we’re excited to make 2014 a banner year for H2O.  Thank you all for your continued support in the H2O movement, and we look forward to seeing at the show

Best wishes,

H2O team

Join the Movement. h2o.ai

The D3 Bomb

The Diablo3 bomb blew through my house this week, destroying work schedules left and right. Every kid (& Dad) played hours of D3.  OMG’s, I can remember D1 – way back in ‘96 before the Diablo’s were numbered.  I must be older than dirt.  Also, being CTO of 0xdata means a zillion customer visits last week (thanks to our plugged-in CEO Sri).  Git claims 600 lines of code from me, down from my weekly average of 3000… blah.  Coding is good for me, I need to do more!

Meanwhile, work at 0xdata is actually proceeding really well despite my lackluster week.  We’re reading & writing HDFS natively.  As I write this, we’re now able to read & write S3.  We’ve got the semantics and design of what is basically the Java Memory Model ironed out for the Cloud (although the implementation is still being worked on).  We’re starting to launch Paxos-based H2O clouds in Amazon EC2.  We’re running larger test suites.

What little coding I did was relating to making Key-delete work right.  The issue is racing Puts followed by Deletes, and delivering a strongly consistent answer when UDP packets are getting lost or re-ordered.  A late-arriving Put cannot “resurrect” a deleted Key and that requires keeping some VectorClock smarts on the deleted Key, instead of just removing all knowledge of the Key.

We’ve got the Git repro opened up to a handful of people and we’re debating when to open it fully.  I’m voting for “wait a little longer”; in particular I want to iron out the design of the execution engine more.  I.e., “word count” on HDFS should not just run fast & well, it should look good also.  I might get overruled on the timing of this, but in any case look for our Git to open up “soon” – some weeks or less.

In other news, I got my $500 deductible returned to me from AllState (which they got from the other drivers’ insurance).  We sold my fiance’s junker car and upgraded her to a car with only 70K miles (down from 225K miles!  The unkillable Nissan Maxima’s brakes finally failed).  I switched the family over from Sprint to TMobile – it’s a better family plan (for me anyways), and that means I finally upgraded my antique phone… to another antique!  Yes!  I managed to dodge the smart-phone brain-drain that’s got all my colleagues one more time.  🙂

Cliff

What's Going On?

As alluded to in my last blog, here’s my fun hack de-jeur: “Whats going on?”

I’ve got a multi-node setup with UDP packets slinging back and forth, and each node itself is a multi-cpu machine.  UDP packets are sliding by one another, or getting dropped on the floor, or otherwise confused.  I’m in a twisty maze of UDP packets all alike (yes, I played the game back in the day).  Then something crashes, and pretty quickly the network is filled with damage-control packets, repair & retry packets, more infinite millions of mirror reflection packets.  What just happened?  I press my handy little button and…

… a broadcast of “dump, ship and die” hits the wires (a few extra times for good measure).  All my busy Nodes stop their endless chatter and dump the last several seconds of packets towards my laptop, slowly & reliably, via TCP.  Each node has been gathering all the packets sent or received to (well, the first 16 bytes of each) in a giant ring-buffer, along with time-stamp info and the other party involved.  After I ship all this data from every node to the one poor victim (that I pressed my button on), every other node dies (to prevent further damage).

The Last Survivor gathers up a bunch of very large UDP packet dumps and starts sorting them.  Of course, you can’t just sort on time, that would be too easy.  No, all the nodes are running with independent clocks; NTP only gets them so close in time to each other.  Instead I have to sort out a giant Happens-Before relationship amongst my packets.  I am helped (above and beyond some sort of home-brew wire-shark) by my application understanding it’s own packet structure.  I know certain packets must be strongly ordered in time, never mind what the clock says.  For example, I only send out an ACK for task#1273 strictly after I receive (and execute) task#1273.  Paxos voting protocols follow certain rules, etc, etc.

In the end, I build a very large mostly-correctly-ordered timeline of what was just going on, as seen by each Node itself, and then HTML’ify it and pop it up on the browser.  Voila!  There for all the world to see is the blow-by-blow confusion of what went wrong (and generally, the follow-on error “recovery” isn’t all that healthy, so more broken behavior follows hard on the heels of broken behavior).

Basically, I’m admitting I’m a tool-builder at heart.  As soon as I realized that standard debuggers don’t work in this kind of situation, and wireshark couldn’t sort based on domain-specific info (and pretty-print the results, again using domain-specific smarts), I went into tool-building mode.  As of this blog, I’ve found several errors in my cloud setup already; e.g. a useless abort-and-restart of a Paxos vote if a heartbeat arrives mid-vote from an ex-cloud-member (that’s alive and well and wants to get back in the Cloud), and some infinite-chatter issues getting key replication settled out as nodes come and go.

On other fronts, my car came back from the body shop, only to turn around and go back to the engine shop: the timing belt had slipped.  The work was done under warranty and I’ll go pick up my car on Monday.  I can hardly wait!!!

My GFs car’s brakes have been squealing for weeks; they finally started shuddering and we decided it was time to fix them.  She’s driving a 1993 Nissan Maxima with 220K miles on it; weird things start breaking at that age, but mostly the car just soldier’s on.  But it was time for the brakes.  We pulled the rear pads & looked at the rotors: one of them was shot.  Fortunately a new rear rotor was only $25, plus another $22 for pads (tax, brake grease, still under $50).  We couldn’t get the dang pistons to move back! We tried at least 5 different wrench/jig/clamp combos to no avail.

We figured the pistons must have been jammed with debris, so with great trepidation we pulled the brake fluid line, the emergency brake cable and pulled the whole unit to my workbench.  I popped the piston out manually.  It looked clean and good… and had this funny thing in the middle… stupid me, failed to check the internet again… it’s the anti-slip mechanism for the emergency brake.  You have to spin the piston to screw it back into the cylinder.  Sigh.  It took us another 1/2hr to find the right tool to spin the dang thing, but it finally went in without too much trouble.  After that it was another hour to reassembly all the parts, and then we had to bleed and bleed and bleed the line.  As of this writing, the pedal is still to soft, I suspect we need to bleed it some more.

Daughter is at the Old Salts Regatta, plus a ton of driving to meet people for 0xdata, plus a much needed dinner out… and down 2 cars (GF’s brakes-in-progress and my car in the shop), made for a very complicated week.

Cliff

registration.  The show is slated to sell out, so be sure to register today and get your 20% discount with our code:  0XDATA20, register here.

Also, H2O CEO and co-founder SriSatish Ambati will give a talk at the Big Data Science Meetup on Monday, Feb. 10 in Ballroom E the night before Strata kicks off!

We hit a growth spurt over the last 9 months and have seen amazing customer traction.  Now at 4000 followers, and 45 meetups later, we’re excited to make 2014 a banner year for H2O.  Thank you all for your continued support in the H2O movement, and we look forward to seeing at the show

Best wishes,

H2O team

Join the Movement. h2o.ai

hack, life, hack, life

{life,hack}* Some Real Life and Hacking interleaved this week.  I added file-upload via the browser, and then ran into a problem with a misconfigured machine poisoning my Paxos voting.  It’s not Paxos, but I kept endlessly triggering a new round of voting, for no good reason.  That took a day to settle out on a reasonable approach. AllState came out to look at my car, and haul it off to the shop… for 3 weeks.  Ugh.  Bent the door reinforcement beam, so I need a new door.  The Evo’s been a rock-solid reliable car with 120K miles on it (and yes I drive it like an Evo should be), so I’m not going to skimp on the repair.  But 3 weeks driving my family beater van.  Ugh.  Also shot about half a day to talk with the various people (appraiser-scheduler, appraiser, claims manager, body shop).  (fyi – the beater van is a Toyota previa with 165K miles on it, purchased explicitly to train teenage drivers; its slow & reliable and cheap). I got our new hire to do a nice hack for better HTML generation from inside the Cloud.  We want a way to let H2O developers (that’s mostly me, Jan Vitek & the new guy right now) churn out HTML pages with internal stats on the Cloud.  So we need an easy way to write fairly pretty HTML, and also fill-in-the-blanks with some Plain Olde Java Code, and install with no extra files (a single JAR install) and be rock-solid reliable on ANY network & browser config (means: No JavaScript! for those security-conscience networks), and be debugable in-house (means: no downloading large packages e.g. JSPs, etc; been there, done that, watched several good engineers lose their lives debugging that crud).  So what did we come up with? You write a Java class for each unique browser page:

public abstract class HTMLIndex extends HTMLUtils {

You have one big HTML String for the page:

final static String html =
HTML.dtd
+ ""
+ "  "
+ "     Welcome to H2O"
+ CSS.tables
+ "  "
+ ""
+ "

[Surf the local K/V Store](Store.View)"
+ "

The Local Cloud has %subnet_size members"
+ "

[Upload a file](Upload)"

(ugh: not sure how to get code to not auto-html-ize here.  It’s a plain Java String with plain HTML embedded in it).  Notice a few boilerplate strings already folded in, including some CSS.  Notice I got href links in there, other html and that funny %subnet_size token.  This isn’t the whole HTML string for this page, more later.  But first: You write the fill-in-the-blanks function, filling in the %subnet_size token with H2O.Cloud._subnet_size:

public static H2ONanoHTTP.Response serve( H2ONanoHTTP server, String uri, Properties header ) {
  String source = html;
...
  source = replace(source,"subnet_size",H2O.Cloud._subnet_size);
  return server.wrap(source);
}

Simple, dumb, frightfully easy to use.  I’m sure it’s been thought of a zillion times before.  (and thanks to NanoHTTP for a 1-file web server)  Oh, and for tables your HTML string looks like:

final static String html =
    + "

The Local Cloud has %subnet_size members"
    + "

[Upload a file](Upload)"
    + "

"
    + ""
    + "%tableRow{"
    + "  "
    + "    "
    + "    "
    + "    "
    + "    "
    + "    "
    + "    "
    + "    "
    + "  "
    + "}"
    + "

| Local Nodes | CPUs | FreeMem | TotalMem | MaxMem | FreeDisk | MaxDisk |
| [%node](Node=%host) | %num_cpus | %free_mem | %tot_mem | %max_mem | %free_disk | %max_disk |

n"

And the fill-in code:

    for (H2ONode h2o : H2O.Cloud._members.values()) {
      // only replace the table line, we will insert the values later one by one
      source = multiReplace(source, "tableRow");
      String host = h2o._inet.getHostAddress();
      source = replace(source,"rowClass",(alt++&1)==0 ? "rowOdd" : "rowEven");
      source = replace(source,"host",host);
      source = replace(source,"node",host);
      source = replace(source,"num_cpus" ,            h2o.get_num_cpus () );
      source = replace(source,"free_mem" ,toMegabytes(h2o.get_free_mem ()));
      source = replace(source,"tot_mem"  ,toMegabytes(h2o.get_tot_mem  ()));
      source = replace(source,"max_mem"  ,toMegabytes(h2o.get_max_mem  ()));
      source = replace(source,"free_disk",toMegabytes(h2o.get_free_disk()));
      source = replace(source,"max_disk" ,toMegabytes(h2o.get_max_disk ()));
    }
    source = remove(source,"tableRow");

That ‘replace’ function does the obvious thing.  Very nice.  A for-loop in the Java-code to fill in the HTML table.  Clean, simple to use.  Yes I am stupidly happy about this hack.  Thanks Petr. More Real Life: changing jobs means changing insurance.  I COBRA’d the skipped month between jobs, but it doesn’t kick immediately (instead it kicks in retroactively).  This means all those bills I take for this entire month have to be re-filed with each individual provider so they can be back-billed to COBRA.  Also, all my providers need to know about the COBRA this month, and about the new plan next month.  TOTAL pain in the butt.  My teenager crushes her glasses weekly, typically using her face to cushion  her headlong rolling dive in the grass, or to put that wrestling move over on her siblings, or to test the power of belt-sanders & power tools, stuff like that.  I glad she’s enjoying life vigorously but there’s a limit to how far an epoxy repair job can go, so I finally had to pony up for another $400 pair of glasses… during the COBRA-hell-month.  My youngest is in for another round of orthodontics.  My cholesterol prescription ran out.  My colonoscopy (after insurance) is running $600.  And so on: each is another shot at my wallet when I’m in the 1-month no-paycheck limbo, and the fact that COBRA cashed my premium check is just ironic. My hack(s)?  First was a major cleanup of my async RPC stuff; greenfield coding is great and all that, but after some QA & a little hammering it needed to go from “fun hack” to “production ready” which needs a whole ‘nother level of thinking.  I’ll save the other major hack for next blog. Last bit of ‘life’: I’ve been listening to the music group Bond.  Awesome stuff and easy on the eyes…  😉 Cliff

Hit and Run

I was driving in Palo Alto with 2 of my kids in the car heading towards Fry’s, looking to get the largest monitor I could reasonably buy.  I was driving down one of the many fairly narrow 4-lane side streets, heading towards El Camino Real when a driver pulled into my blind spot.  With a block-and-a-half to go to the red light, the driver very slowly started to overtake me… then he started edging into my lane.  I honked and edged up and over to give me some space… with now much less than a block to go the driver suddenly floored his vehicle and came over into my lane (his lane was ending quickly, being filled with large yellow Caddie parked at the light).  I hit the brakes but couldn’t get any more over (no shoulder, sidewalk full of telephone poles and other crud).  Crunch… I was fairly certain he had hit the Caddie, and I knew he had hit me.

I checked my kids, all ok.  I looked out the window at the other car & the Caddie.  He was between the Caddie and me, and was trying to back down the street – but got blocked by cars coming up behind him.  He stopped moving and started getting out of his vehicle, so I took a breather and got a clearer look – there were lots of cars, lots of honking; my car looked drivable and in the way and there was a Jiffy Lube at the corner, so I waved to him and rolled around the corner and parked.

I got out and headed over to his car… and he was busy yelling and screaming at the Caddie driver (who had been parked at a red light the whole time!).  He looked at me and got back into his car and rolled up to light and around the corner, clearly to join me at the Jiffy Lube… and took off instead.  Gone.  I was dumbfounded, and stood looking at his vanishing car like an idiot.  One of the Jiffy Lube guys came up to me and asked if I was ok; I nodded and went over to check out the Caddie guy.  He was a timid old man and was really shook up by the screaming-at he just had.  It was clear his vehicle was blocking the road and looked untouched.  I stuck with him until he felt better, then he got in his car and drove off.  I walked back to mine to survey the damage and double-checked the kids.  Sure enough, a big dent with lots of missing paint.

Then the Jiffy Lube guy volunteered that he had heard the crash, had come around the corner, had watched the driver chew out the Caddie… and decided the driver looked “off”.  He had the plate number written down and encouraged me to call the cops.  So I did; and sure enough in about 5 minutes a Palo Alto policeman showed up.  He was very polite and called in the plate#, and started taking statements.  In another 10 minutes he reported that they had caught the guy!  Then he offered me a chance to eye-witness the driver where they had stopped him… so I followed the cop back across Palo Alto and did a slow drive-by… and indeed it was the same guy.  They offered me the chance to press charges – but they also reported he didn’t seem drunk/drugged – so I decided to do this with just insurance instead.  Maybe he was just having a bad day.

Today AllState came out and looked at my car, and hauled it off to the body shop.  Maybe in a week I have a car with a fresh paint job?  Kudos to the Palo Alto police department; that was quick work catching the guy.  Kudos to the Jiffy Lube guy who had the brains to get the plate number.  And I did eventually get my monitor.

You didn’t think I was going to talk about technology, did you?

Cliff

Where'd The Week Go?

Where’d the week go?  I managed to implement an async RPC call based on UDP packets; that was kinda slick.  Its more-or-less stateless, so I can re-send the call or re-send the ack+result repeatedly without anybody getting confused.  I added an HTML RESTful Get/Put API, plus some Key surfing.  Based on this+Paxos, I have the starter bits for a distributed K/V store!  Seems like everybody has one of these, so I figured I’d better go make one also.  😉

Then I tried my toy on Linux…it’s been running great on Windows & Mac with 3 to 5 machines.  Linux should be a piece-o-cake!  Or not so.  First stumbling block was Java network self-discovery is painful on Linux: lots and lots of network devices are enabled but few are actually connected to the network.  Then I ended up enabling an IPV6 device… I’m not quite IPV6 ready (close, but not there).  Also, I kept finding loopback addresses.  Then I hit Java signed-byte issues (gah!  a COMPLETELY ridiculous language design decision; bytes are 99.9% used in an unsigned fashion).  Took me another day to pull out all the bugs… then we immediately ran and demo’d it!  (shakes head)   There’s like 1 week’s worth of work in there!  I’m not sure what to make of it all; we were completely upfront with where the product was… and the end-user is loving it.

We ran back to the office High-Five-ing each other and stared at our big empty room.  And then a friend of Sris called: an unrelated startup in S.F. has stumbled and is selling off new office furniture at firesale prices.  We ran to a U-Haul shop & got a $20 box truck (plus $1/mile … that adds up fast).  An hour later we were stuffing desks and chairs into the thing.  We got 6 desks and 6 nice ergo chairs for $1500…. and that is fantastically higher than our rent… I think I’m losing my sense of money here.  (shakes head again)  We ran away with our booty and an hour later were unloading into the not-so-empty office. Those desks have slick wheels and the floor is good and the room is big; after I got up a good head of steam I hopped on the desks for a ride across the room.  Brings new meaning to the word “execu-glide“  Holy moly!  The office is starting to look… office-like.  Needs a microwave (but not a coffee machine; the coffee press came in the building with me on that first visit; must have priorities).  I left Sris to return the truck, and I ran home to a long promised date-night.  Made it just in time and a nice romantic dinner was had, complete with thunderous rain and lightning (umm, excuse me Mr. California?  I pay my cost-of-living taxes!!!  It’s mid-April… where’s the warm sunshine we were promised?)

Next day was mostly spent doing paperwork.  And I’m only doing the new employee paperwork – Sris’ got the backend of this mountain.  Sigh.  Startups are 90% perspiration, 9% inspiration and another 90% paperwork.  Health insurance, payroll, building insurance, PGE wants to be paid, the office water got shut off (fixed now), W-4’s, I-9’s, 401K’s… the list goes on and on.  My home printer died.  My SO’s netbook died (fixed by pulling the battery and power-cycling to drain the caps so it REALLY got a cold-start).

Happy-not-sigh: my kids came back from their mom’s after a week-long spring-break vacation.  More Halo with the boys, more stupendously silly dinner conversation.  The refurbished bathroom looks amazing, and everybody got to apply giant fish stickers on the walls.  Today I am playing taxi-Dad and taking one kid to Sea Scouts, then taking the others shopping for giant monitors, hacker keyboards and Magic-the-Gathering card decks.

Cliff

Hackers Weekend, Dining Philosophers

We had a hackers weekend at 0xdata, inviting over some potential new hires (and all the handful of employees) and then having a weekend long whiteboard “discussion” – a long excited romp through the innards of what 0xdata is building, and what we can do better, and plain old how-to-solve some problems, which markets to go after first, etc, etc.

The furniture we bought isn’t due to show up yet so I tossed down some folding chairs and beanbags I bought from home, plus an aged whiteboard I had lying around.  We used an iPhone for a local wi-fi hotspot (the building’s Real Internet is coming Tuesday) – and the connectivity sucked.  Great for testing out a budding Cloud’s connection & repair logic.  Paxos worked flawlessly.  We stopped by Safeway for the required junk food and Starbux coffee, and ended up snacking in the sunshine, sipping coffee, and talking distributed & concurrent algorithms.  Dinner rolled through coding exercises, beer, the Right Way to run a programmer’s interview, more distributed algorithms, more beer, the comical errors of past VCs, and whether sex, beer or coding is better (drugs & rock-n-roll never made the list).  Holy crap!  If I had known running my own startup would be this much fun, I’d done it years ago!!!

Cliff

What the Heck is Cliff Up To?

I finally “Got With The Program” – the standard Silicon Valley program that is – and founded a new startup, 0xdata. That’s zero-ex-data like hexadecimal but with ‘data’ instead of ‘decimal’ – and pronounced “hexdata”.  I’ve been having a blast doing all those things founders do.  In particular, I’m hacking code seriously again for the first time in years.  It’s green-field hacking and I’m pumping out the code.  I hacked a Paxos algorithm over the weekend, directly writing bits into UDP packets… fun!  We’re doing distributed systems stuff and using Paxos for auto-Cloud-discovery.  I tested with Paxos by randomly killing JVMs with “kill -9” and re-launching in a tight loop.  Nodes come & go, but the Cloud remains.  Paxos Rocks!

Today I did Yet Another Interview, 5th or 6th this week, in RedRock, Mountain View.  Sharp guy, but well encased in Generic Big Company… and bored silly.  We’ll see.  Did I mention we have office space now (I mean besides RedRock)?  And basically for free?  The owner took a modest amount of stock in lieu of rent.

We went furniture shopping… at the “startup furniture store”.  This guy cycles used office furniture in and out through startups.  Interesting business model and he had the Right Stuff for us, for cheap. The guy was hysterical!  He was cracking a joke every few seconds, and any possible straight-line was immediately played on.  I think he’s seen one too many startups in his time; once he sold some furniture to a startup guy, bought it back 6 months later on fire sale, then sold it again to the same guy again.  His business model must be one of the more reliable ones around, sorta like the gold-rush merchants who made it rich selling shovels to miners.

What am I doing technically?  Alas, that mostly has to remain quiet for a bit longer.  But the Universe has a ‘hole’ in it, a vacuum, an imbalance of available fast cheap fragile X86 servers with plenty of DRAM, big disks and fast networks – and on the other side a real absence of easy-to-use programming tools.  Those big disks have filled up over the years with Big Data, and thanks to MapReduce, Hadoop and NoSQL we are able to store and address a bunch of connected machines as a single resource.  In other words: we got plenty of unreliable CPU, memory, disk & network… but we can’t get at with the same ease the hardware guys made possible when going to dram from multiple CPUs connected over internal buses.  Until we break that ease-of-use barrier, we’ll never get every-day programmers coding distributed systems as easily as we do single machines now.

Programmers are crying out for a Better Answer, a better way to convert Big Data into Knowledge.  This is the beginning of a the next epoch in distributed computing – and it is only in its first generation.  The best is yet to come.

I’m On The Job.

Cliff

Too Much Theory, Part 3

This is the 3rd and last part of this blog!  (thank heavens!), where I wax poetic about Lattices and Type Theory, and their applications to Compilers and in particular Java JITs. Part 1: Too Much Theory Part 2:  Too Much Theory, Part 2 ## A Quick Recap We’re building a lattice, to allow us to do exciting optimizations on Java programs.  Our lattice has distinguished top and bottom types, and is Commutative, Associative, and Symmetric.  We understand pointers can be null, not_null, or unknown (e.g. bottom).  Because of symmetry across the lattice centerline, we also have the dual of not_null which we’ll call any_null.  For this recap, we’ll also have Java Classes (e.g. String), but we’re going to ignore subclassing for the moment.  Our lattice looks something like this: Wow, that’s getting fancy…  lets revisit some of the elements in this lattice real quick: * bottom: All possible values, including null, as computed by running the program.  No constants. * String:bot: All possible Strings, including null, as computed by running the program.  No constants, no XMLNodes.  For brevity I will sometimes refer to this as String (no trailing :bot notation). * XMLNode:bot or plain XMLNode: All possible XMLNodes, including null, as computed by running the program.  No constants, no Strings. * bottom:not_null: All possible values, as computed by running the program.  No constants, no null. * String:not_null: All possible Strings, as computed by running the program.  No constants, no XMLNodes, no null. * String:hello_world: The constant String “hello world”, and nothing else. * null: The constant null. * String:any_null: All possible String constants, all at once.  No XMLNodes, nor null.  An impossible situation in a real program, but a useful mathematical notion when running Constant Propagation. * String:top: All possible String constants and the null constant, all at once.  No XMLNodes. * top: All possible constants, including all Strings and all XMLNodes and null, and all them all at once. Notice the symmetry: ever Node below the centerline of constants also appears in dual form above the centerline.  The edges are also symmetric: e.g. top has 4 edges out and bottom has 4 edges in, String:top has 1 in, 2 out, and String:bot has 2 in, 1 out, etc. This lattice will let us find the constants in code like this example from last time:

final class LinkedListThing {
  LinkedListThing _next; // instance field
  int hashCode() {
    int hash = 0;
    Object x = this; // x is LinkListThing:not_null
    while( x != null ) {
      hash += x.hashCode(); // x is LinkListThing:not_null
      if( x instanceof LinkedListThing )
         x = ((LinkedListThing)x)._next; // x is LinkedListThing
      else // with Conditional Constant Propagation...
         x = "fail"; // ...this path is dead
      // x is LinkedListThing:bottom and not a String
    }
    return hash;
  }
}

And Now Gentle Readers, Let Us Ponder Subtypes Java, of course, has subtypes. So does C++ and many other languages. Lets take a look at what it would take to add subtypes to our lattice. This tiny program snippet makes exactly a Hashtable:

  Hashtable x = new Hashtable(); y = x.get(); // Calls Hashtable.get

And this snippet makes a known subclass of Hashtable, but treats it like a Hashtable:

  Hashtable x = new Provider();  y = x.get(); // Calls Provider.get

And this snippet mentions an UNknown subclass of Hashtable:

  void foo( Hashtable x ) {      y = x.get(); // Calls ???

When does ‘Hashtable x’ refer to exactly a Hashtable, and when does it refer to some subclass of Hashtable? Knowing if some value is exactly a Hashtable versus a subclass is very useful: we can optimize calls made to exactly known classes. Example: What function is called when we call “x.get()”? Well, if x is exactly a Hashtable, we get Hashtable.get() … and we can inline “get()”. If x is a Hashtable-or-a-subclass, then “x.get()” might be Provider.get() or Hashtable.get() or some other user-specified derived version of “get()”, and we cannot inline. The job of figuring out if ‘x’ is exactly of class Hashtable, or some subclass of Hashtable falls to Constant Propagation – and that requires we represent this notion of ‘exact class’ in the lattice. Lets add the notion ‘Hashtable:exact’ to mean exactly a Hashtable, and NO subclasses, while plain ‘Hashtable’ allows subclasses. This is an independent axis from allowing null, so we can still have e.g. Hashtable:exact:not_null.  Note that final classes like String cannot subclass and so are always String:exact; constants are always the exact class that they are, e.g. Hashtable:exact:0x1234. I have another “Lattice” for you to ponder, with Hashtables instead of XMLNodes.  I’m still using bottom in places but I might as well use Object:bot or just plain Object.  Also, I ‘twisted’ the picture slightly: lattice elements in the middle all exclude null, and those on the outer edges all include null.  I had to duplicate the null element to make the graph lay flat visually, but really they are the same element.  I could have laid the graph ‘flat’ on a cylinder, but the web’s not up to 3-D visualization yet. . ## And Now The Punch Line And now for the Trick Question: what happens when I end up mixing together (with Phi functions on loop backedges) Hashtable:top with String:exact:not_null?  (A different question is how I come to such a situation that I attempting to mix these types… but trust me I’ve seen this happen in QA from suitably complex programs!)  So back to my Question: I am look for the most precise answer possible, anything less and I’ll be losing type information for no good reason.  So for example bottom is a correct answer, but very conservative – we can do better than that! How about bottom:not_null?  Since for Hashtable:top the compiler can pick any Hashtable it wants – we want it to pick a Hashtable, ANY Hashtable.  The result of mixing that with String:exact:not_null is … some random not-null Object: bottom:not_null.  This might let me remove a null-pointer check at compile time. But we could also take just the null from Hashtable:top, since the compiler is allowed to pick that instead!  Remember: the top label means the compiler “gets to choose” during the course of Constant Propagation – and picking a more precise answer makes it more likely we’ll find a constant and have our choice remain correct.  So mixing null and String:exact:not_null yields String:exact.  This might, e.g., let me convert an unknown x.hashCode() into a known String:hashCode() call. So which one do I pick?  bottom:not_null or String:exact?  Typically I have no idea during the course of CP which one will pan out better – or actually if either will lead to a better final answer.  The Real Trick here is: I should not be required to pick.  The Constant Propagation algorithm relies on having a lattice – and lattices are defined as having a unique lower-bound (and unique upper-bound) for any two elements.  This structure does NOT have a unique lower-bound, and so is no longer a lattice!  For some particular Java program and some particularly bad luck with our CP algorithm – the CP algorithm can hang indefinitely, or miss some constants on some runs – but find those constants on other runs for the same Java program.  Garbage-In Garbage-Out.  CP is technically wedged. In practice HotSpot’s CP does not hang, although I suspect I can carefully arrange a Java program for which it would.  Instead, I end up triggering asserts in QA about how my lattice is lacking the proper symmetry (and hence losing constants for no other good reason, but only in weird programs).  I did construct 2 simple programs for which choosing bottom:not_null versus String:exact would find a constant (and enable an optimization) in one program and not the other… and vice-verse for reversing  choices. Cliff ## Postlude As you might have guessed by finding this blog, I’m off to a new adventure – this time using a JVM instead of building one (well maybe: I’m very pragmatic; may the best language win!).  And while I’ll probably still be tinkering in HotSpot from time to time, I think my future blogs will mostly be about my new adventure! – till next time, Cliff