in Uncategorized

Learn to manage, munge, and model big data with H2O on the Hortonworks Sandbox

Working with big data might seem like a daunting task if like me, you've spent the majority of your college years doing pencil and paper proofs. Big data for me was anything that took longer than 30 minutes to ingest into single threaded R.

For mathematicians and statisticians looking to understand widely used data platforms like Hadoop for data storage and data management, Hortonworks Sandbox is an awesome all-in-one self-teaching tool. Getting a standalone Hadoop environment on your personal computer is as easy as launching a VM.


To actually start doing predictive analytics, launch H2O on the server either as a simple JVM or a mapper task that’ll utilize all the nodes in the cluster. When it comes time to actually move from a test and research setting to a production one, the same installation and launch holds for however many nodes you add to the cluster.

H2O and Hortonworks Sandbox will turn the uninitiated into data scientists with a gentle sloping learning curve. Both H2O and Hortonworks are open source big data powerhouses that you can learn from and perhaps eventually contribute to.

It’s free to try so get started now with the following tutorial : Predictive Analytics on H2O and Hortonworks Data Platform

For more information about how H2O operates on Hadoop check out : H2O on Hadoop