H2O announces GPU Open Analytics Initiative with MapD & Continuum

H2O.ai, Continuum Analytics, and MapD Technologies have announced the formation of the GPU Open Analytics Initiative (GOAI) to create common data frameworks enabling developers and statistical researchers to accelerate data science on GPUs. GOAI will foster the development of a data science ecosystem on GPUs by allowing resident applications to interchange data seamlessly and efficiently. BlazingDB, Graphistry and Gunrock from UC Davis led by CUDA Fellow John Owens have joined the founding members to contribute their technical expertise.

The formation of the Initiative comes at a time when analytics and machine learning workloads are increasingly being migrated to GPUs. However, while individually powerful, these workloads have not been able to benefit from the power of end-to-end GPU computing. A common standard will enable intercommunication between the different data applications and speed up the entire workflow, removing latency and decreasing the complexity of data flows between core analytical applications.

At the GPU Technology Conference (GTC), NVIDIA’s annual GPU developers’ conference, the Initiative announced its first project: an open source GPU Data Frame with a corresponding Python API. The GPU Data Frame is a common API that enables efficient interchange of data between processes running on the GPU. End-to-end computation on the GPU avoids transfers back to the CPU or copying of in-memory data reducing compute time and cost for high-performance analytics common in artificial intelligence workloads.

Users of the MapD Core database can output the results of a SQL query into the GPU Data Frame, which then can be manipulated by the Continuum Analytics’ Anaconda NumPy-like Python API or used as input into the H2O suite of machine learning algorithms without additional data manipulation. In early internal tests, this approach exhibited order-of-magnitude improvements in processing times compared to passing the data between applications on a CPU.

“The data science and analytics communities are rapidly adopting GPU computing for machine learning and deep learning. However, CPU-based systems still handle tasks like subsetting and preprocessing training data, which creates a significant bottleneck,” said Todd Mostak, CEO and co-founder of MapD Technologies. “The GPU Data Frame makes it easy to run everything from ingestion to preprocessing to training and visualization directly on the GPU. This efficient data interchange will improve performance, encouraging development of ever more sophisticated GPU-based applications.”

“GPU Data Frame relies on the Anaconda platform as the foundational fabric that brings data science technologies together to take full advantage of GPU performance gains,” said Travis Oliphant, co-founder and chief data scientist of Continuum Analytics. “Using NVIDIA’s technology, Anaconda is mobilizing the Open Data Science movement by helping teams avoid the data transfer process between CPUs and GPUs and move nimbly toward their larger business goals. The key to producing this kind of innovation are great partners like H2O and MapD.”

“Truly diverse open source ecosystems are essential for adoption – we are excited to start GOAI for GPUs alongside leaders in data and analytics pipeline to help standardize data formats,” said Sri Ambati, CEO and co-founder of H2O.ai. “GOAI is a call for the community of data developers and researchers to join the movement to speed up analytics and GPU adoption in the enterprise.”

The GPU Open Analytics Initiative is actively welcoming participants who are committed to open source and to GPUs as a computing platform.

Details of the GPU Data Frame can be found at the Initiative’s Github repo.

Machine Learning on GPUs

With H2O GPU Edition, H2O.ai seeks to build the fastest artificial intelligence (AI) platform on GPUs. While deep learning has recently taken advantage of the tremendous performance boost provided by GPUs, many machine learning algorithms can benefit from the efficient fine-grained parallelism and high throughput of GPUs. Importantly, GPUs allow one to complete training and inference much faster than possible on ordinary CPUs. In this blog post, we’re excited to share some of our recent developments implementing machine learning on GPUs.

Consider generalized linear models (GLMs), which are highly interpretable models compared to neural network models. As with all models, feature selection is important to control the variance. This is especially true for large number of features; \(p > N\), where \(p\) is the number of features and \(N\) is the number of observations in a data set. The Lasso regularizes least squares with an \(\ell_1\) penalty, simultanously providing shrinkage and feature selection. However, the Lasso suffers from a few limitations, including an upper bound on variable selection at \(N\) and failure to do grouped feature selection. The elastic net regression overcomes these limitation by introducing an \(\ell_2\) penality to the regularization [1]. The elastic net loss function is as follow:

, where \(\lambda\) specifies the regularization strength and \(\alpha\) controls the penalty distribution between \(\ell_1\) and \(\ell_2\).

Multiple GPUs can be used to fit the full regularization path (i.e. \(\lambda\) sweep) for multiple values of \(\alpha\) or \(\lambda\).

Below are the results of computing a grid of elastic net GLMs for eight equally spaced value of \(\alpha\) between (and including) 0 (full \(\ell_2\)) and 1 (full \(\ell_1\); Lasso) across the entire regularization path of 100 \(\lambda\) with 5-fold cross validation. Effectively, about 4000 models are trained to predict income using the U.S. Census data set (10k features and 45k records).

Five scenarios are shown, including training with two Dual Intel Xeon E5-2630 v4 CPUs and various numbers of P100 GPUs using the NVIDIA DGX-1. The performance gain of GPU-acceleration is clear, showing greater than 35x speed up with eight P100 GPUs over the two Xeon CPUs.

Similarily, we can apply GPU acceleration to gradient boosting machines (GBM). Here, we utilize multiple GPUs to train separate binary classification GBM models with different depths (i.e. max_depth = [6,8,10,12]) and different observation sample rates (i.e. sample_rate = [0.7, 0.8, 0.9, 1]) using the Higgs dataset (29 features and 1M records). The GBM models were trained under the same computing scenarios as the GLM cases above. Again, we see substantial speed up of up to 16x when utilizing GPUs.

GPUs enable a quantum leap in machine learning, opening the possibilities to train more models, larger models, and more complex models — all in much shorter times. Iteration cycles can be shortened and delivery of AI within organizations can be scaled with multiple GPU boards with multiple nodes.

The Elastic Net GLM and GBM benchmarks shown above are straightforward implementations, showcasing the raw computational gains of GPU. On top of this, mathematical optimizations in the algorithms could result in even more speed-up. Indeed, the H2O CPU-based GLM is sparse-aware when processing the data and our newly-developed H2O CPU-based GLM implements mathematical optimizations, which lead it to outperform a naive implementation by a factor of 10 — 320s for H2O CPU GLM versus 3570s for naive CPU GLM. The figure below illustrates the H2O CPU GLM and H2O GPU GLM against other framework implementations (tensorflow uses stochastic gradient descent and warmstart, while H2O CPU version and Scikit Learn use a coordinate descent algorithm, while H2O GPU GLM uses a direct matrix method that is optimal for dense matrices — we welcome improvements to these other frameworks, see http://github.com/h2oai/perf/).

H2O GPU edition captures the benefits from both GPU acceleration and H2O’s implementation of mathematical optimizations taking the performance of AI to a level unparalleled in the space. Our focus on speed, accuracy and interpretability has produced tremendously positive results. Benchmarks presented in this article are proofs of such, and we will have more benchmark results to present in the near future. For more information about H2O GPU edition, please visit www.h2o.ai/gpu.

[1] H. Zou and T. Hastie. “Regularization and variable selection via the elastic net” https://web.stanford.edu/~hastie/Papers/B67.2%20(2005)%20301-320%20Zou%20&%20Hastie.pdf

The Race for Intelligence: How AI is Eating Hardware – Towards an AI-defined hardware world

With the AI arms race reaching a fever pitch, every data-driven company is (or at least should be) evaluating its approach to AI as a means to make their owned datasets as powerful as they can possibly be. In fact, any business that’s not currently thinking about how AI can transform its operations risks falling behind its competitors and missing out on new business opportunities entirely. AI is becoming a requirement. It’s no longer a “nice to have.”

It’s no secret that AI is hot right now. But the sudden surge in its popularity, both in business and the greater tech zeitgeist, is no coincidence. Until recently, the hardware required to compute and process immense complex datasets just didn’t exist. Hardware, until now, has always dictated what software was capable of — in other words, hardware influenced software design.

Not anymore.

The emergence of graphic processing units (GPUs) has fundamentally changed how people think about data. AI is data hungry — the more data you feed your AI, the better it can perform. But this obviously presents computational requirements, namely, substantial memory (storage) and processing power. Today’s GPUs are 100x faster than CPUs, making analysis of massive data sets possible. Now that GPUs are able to process this scale of data, the potential for AI applications are virtually limitless. Previously, the demands of hardware influenced software design. Today, the opposite is true: AI is influencing how hardware is designed and built.

Here are the three macro-level trends enabling AI to eat hardware:

1.) AI is Eating Software

The old paradigm that business intelligence software relies upon rule-based engines no longer applies. Instead, the model has evolved to the point where artificial intelligence software now relies upon statistical data training, or machine learning. As statistical data training grows up, it’s feasting on rules-based engines. However, this transformation requires an immense amount of data to train the cognitive function, and AI is influencing the design of hardware to facilitate the training. AI is not only influencing hardware design, as evidenced by the rise of GPUs, but also eating the traditional rules-based software that has long been the hallmark of business intelligence.

What does this mean in practical terms? It means businesses can now use AI to address specific problems, and in a sense “manufacture” intelligence. For example, creating a human doctor involves roughly 30 years of training, from a child’s birth to when he or she has completed her residency and gets their first job. But with AI, we can now create a “doctor” without 30 years of training. On a single chip, encoded with AI, a self-learning “doctor” can be trained in 11 days with petabytes of data. Not only that, you can install this “doctor” into a million places by replicating that chip, so long as there’s a device and connectivity.

This may be an extreme example, but it illustrates just how quickly AI is advancing our ability to understand from data.

2.) The Edge is Becoming More Intelligent

Another major trend supporting AI’s influence over hardware is the democratization of intelligence. In the 1980s, mainframes were the only devices powerful enough to handle large datasets. At the time, nobody could have possibly imagined that an invention like the personal computer would come along and give the computing power of a mainframe to the masses.

Fast forward 30 years later, history is repeating itself. The Internet of Things is making it possible for intelligence to be distributed even further from centralized mainframes, to literally any connected device. Today, tiny sensors have computing power comparable to that of a PC, meaning there will be many more different types of devices that can process data. Soon, IoT devices of all sizes will be much more powerful than the smartphone.

This means that intelligence is headed to the edge, away from big, centralized systems like a mainframe. The cloud enables connection between edge and center, so with really smart devices on the edge, information can travel rapidly between any number of devices.

3.) Everything is Dataware

AI constantly seeks data, and business intelligence is actionable only when the AI has a steady diet of data. Thanks to the hardware movement and the shift of intelligence to the edge, there are more points of data collection than ever. However, the hardware movement is not just about collecting and storing data, but rather continuously learning from data and monetizing those insights. In the future, power is at the edge, and over time, the power of the individual device will increase. As those devices continue to process data, the monetization of that data will continue to make the edge more powerful.

AI presents us with a distributed view of the world. Because data is being analyzed on the edge and continuously learning, knowledge is not only increasing at the edge, but flowing back to the center too. Everything is now dataware.

As the demands for data processing power increase across businesses, AI is transforming how enterprises shape their entire data strategy. Software is changing as a result. Gone are the days where rules-based computing is sufficient to analyze the magnitude of available data. Statistical data training is required to handle the load. But CPUs can only handle a fraction of the demand, so the demands of AI are influencing the way that hardware is designed. As hardware becomes more ubiquitous via IoT, intelligence and data are moving to the edge and the balance of power is shifting to the masses.