H2O goes to qconsf

Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users.

Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm.


Slides from the talk

Published by


This is the "wpengine" admin user that our staff uses to gain access to your admin area to provide support and troubleshooting. It can only be accessed by a button in our secure log that auto generates a password and dumps that password after the staff member has logged in. We have taken extreme measures to ensure that our own user is not going to be misused to harm any of our clients sites.