Learn to manage, munge, and model big data with H2O on the Hortonworks Sandbox

Working with big data might seem like a daunting task if like me, you've spent the majority of your college years doing pencil and paper proofs. Big data for me was anything that took longer than 30 minutes to ingest into single threaded R.

For mathematicians and statisticians looking to understand widely used data platforms like Hadoop for data storage and data management, Hortonworks Sandbox is an awesome all-in-one self-teaching tool. Getting a standalone Hadoop environment on your personal computer is as easy as launching a VM.


To actually start doing predictive analytics, launch H2O on the server either as a simple JVM or a mapper task that’ll utilize all the nodes in the cluster. When it comes time to actually move from a test and research setting to a production one, the same installation and launch holds for however many nodes you add to the cluster.

H2O and Hortonworks Sandbox will turn the uninitiated into data scientists with a gentle sloping learning curve. Both H2O and Hortonworks are open source big data powerhouses that you can learn from and perhaps eventually contribute to.

It’s free to try so get started now with the following tutorial : Predictive Analytics on H2O and Hortonworks Data Platform

For more information about how H2O operates on Hadoop check out : H2O on Hadoop


Published by


This is the "wpengine" admin user that our staff uses to gain access to your admin area to provide support and troubleshooting. It can only be accessed by a button in our secure log that auto generates a password and dumps that password after the staff member has logged in. We have taken extreme measures to ensure that our own user is not going to be misused to harm any of our clients sites.