What does NVIDIA’s Rapids platform mean for the Data Science community?

Today NVIDIA announced the launch of the RAPIDS suite of software libraries to enables GPU acceleration for data science workflows and we’re excited to partner with NVIDIA to bring GPU accelerated open source technology for the machine learning and AI community.

“Machine learning is transforming businesses and NVIDIA GPUs are speeding them up. With the support of the open source communities and customers, H2O.ai made machine learning on GPUs mainstream and won recognition as a leader in data science and machine learning platforms by Gartner. NVIDIA’s support of the GPU machine learning community with RAPIDS, its open-source data science libraries, is a timely effort to grow the GPU data science ecosystem and an endorsement of our common mission to bring AI to the data center. Thanks to our partnership, H2O Driverless AI powered by NVIDIA GPUs has been on an exponential adoption curve — making AI faster, cheaper and easier.” – Sri Ambati, CEO and Founder, H2O.ai

Let’s look at the announcement in a bit more detail. The new software stack sets out to accelerate the entire workflow of data science and analytics by focusing on three building blocks

DataFrame – cuDF – This is a dataframe-manipulation library based on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The Python bindings of the core-accelerated CUDA DataFrame manipulation primitives mirror the Pandas interface for seamless onboarding of Pandas users.

Machine Learning Libraries – cuML – This collection of GPU-accelerated machine learning libraries will eventually provide GPU versions of all machine learning algorithms available in Scikit-Learn.

Graph Analytics Libraries – cuGRAPH – This is a framework and collection of graph analytics libraries

A lot of the other packages in the architecture diagram have already been out there for a while, but this new announcement brings them all together with a promise of integration, ease of installation and use. cuDNN and cuGraph (previously called nvGRAPH) especially are very popular and are used by many developers. NVIDIA’s linear algebra & math libraries which include primitives like cuBLAS, CUDA Math Library, and others are used by many different frameworks as the building blocks including by us here at H2O.ai

H2O.ai is committed to accelerating automatic machine learning on NVIDIA GPUs. It was nearly a year ago, H2O.ai for the first time in the industry demonstrated that statistical machine learning algorithms can be accelerated with GPUs with our H2O4GPU (Github) package. This now powers our pathbreaking commercial offering, Driverless AI, that brings Automatic Machine Learning for the Enterprise. As a major contributor to XGBoost GPU and a leader in AI and ML, we are pleased to see the development of Rapids and we hope to see more open source development for GPU accelerated machine learning.

The new announcements around cuDF and cuML are the successors to the GOAI project, of which H2O.ai was a founding member. As the open source leader in AI and ML, we love that NVidia is contributing new technology for the AI community. The two key developments here are the adoption of the Apache Arrow framework as standard data structure across all the different libraries. This allows for easy integration with the growing ecosystem that now supports Arrow. The second one is around the Python bindings for cuDF that mimic Apache Pandas interface. This can potentially accelerate data munging and transformation by an order of magnitude.

We are pleased to see NVIDIA embrace Data Science & Machine Learning which validates our core mission and vision that we’ve been driving for 7 years. We believe that machine learning will be the key part of any company’s AI Strategy and Transformation. We look forward to contributing to the Rapids project with our best of breed open source algorithms and use the underlying libraries in our Driverless AI enterprise platform.

Come meet the Makers!

NVIDIA’s GPU Technology Conference (GTC) Silicon Valley, March 26-29th is the premier AI and deep learning event, providing you with training, insights, and direct access to the industry’s best and brightest. It’s where you will see the latest breakthroughs in self-driving cars, smart cities, healthcare, high-performance computing, virtual reality and more, and all because of the power of AI. H2O.ai will be there in full force to share how you can immediately gain value and insights from our industry-leading AI and ML platforms. In case you hadn’t heard, H2O.ai was named a leader in 2018 Gartner Magic Quadrant for Data Science and Machine Learning platforms. You can get the report here.

Please visit us at booth #725 to see Driverless AI in action and talk to the Makers leading the AI movement! Our sessions will be leading edge talks that you won’t want to miss.

  1. Ashrith Barthur – Network Security with Machine Learning

    Ashrith will speak about modeling different kinds of cyber attacks and building a model that is able to identify these different kinds of attacks using machine learning.

    Room 210F – Wednesday, 28 March, 9 AM to 9:50 AM.

  2. Jonathan McKinney – World’s Fastest Machine Learning with GPUs

    Jonathan will introduce H2O4GPU, a fully featured machine learning library that is optimized for GPUs with a robust python API that is a drop dead replacement for scikit-learn. He will demonstrate benchmarks for the most common algorithms relevant to enterprise AI and will showcase performance gains as compared to running on CPUs.

    Room 220B – Thursday, March 29, 11 AM to 11:50 AM.

  3. Arno Candel – Hands-on with Driverless AI

    In this lab, Arno will show how to install and start Driverless AI, the automated Kaggle Grandmaster in-a-box software, on a multi GPU box. He will go through the full end-to-end workflow and showcase how Driverless AI uses the power of GPUs to achieve 40x speedups on algorithms that in turn allow it run thousands of iterations and find the best model.

    Room LL21C – Thursday, March 29, 4 PM to 6 PM.

Can’t make it to the event? Schedule a time to talk to one of our makers!

H2O4GPU Hands-On Lab (Video) + Updates

Deep learning algorithms have benefited significantly from the recent performance gains of GPUs. However, it has been uncertain whether GPUs can speed up powerful classical machine learning algorithms such as generalized linear modeling, random forests, gradient boosting machines, clustering, and singular value decomposition.

Today I’d love to share another interesting presentation from #H2OWorld focused on H2O4GPU.

H2O4GPU is a GPU-optimized machine learning library with a Python scikit-learn API tailored for enterprise AI. The library includes all the CPU algorithms from scikit-learn and also has selected algorithms that benefit greatly from GPU acceleration.

In the video below, Jon McKinney, Director of Research at H2O.ai, discussed the GPU-optimized machine learning algorithms in H2O4GPU and showed their speed in a suite of benchmarks against scikit-learn run on CPUs.

A few recent benchmarks include:

We’re always receiving helpful feedback from the community and making updates.

Exciting updates to expect in Q1 2018 include:

  • Aggregator
  • DBSCAN
  • Kalman Filters
  • K-nearest neighbors
  • Quantiles
  • Sort

If you’d like to learn more about H2O4GPU, I invite you to explore these helpful links:

Happy Holidays!

Rosalie