Happy Holidays from H2O.ai

Dear Community,

Your intelligence, support and love have been the strength behind an incredible year of growth, product innovation, partnerships, investments and customer wins for H2O and AI in 2017. Thank you for answering our rallying call to democratize AI with our maker culture.

Our mission to make AI ubiquitous is still fresh as dawn and our creativity new as spring. We are only getting started, learning, rising from each fall. H2O and Driverless AI are just the beginnings.

As we look into 2018, we see prolific innovation to make AI accessible to everyone. Simplicity that opens scale. Our focus on making experiments faster, easier and cheaper. We are so happy that you will be the center of our journey. We look forward to delivering many more magical customer experiences.

On behalf of the team and management at H2O, I wish you all a wonderful holiday: deep meaningful time spent with yourself and your loved ones and to come back refreshed for a winning 2018!

Gratitude for your partnership in our beautiful journey – it’s just begun!

this will be fun,


Sri Ambati
CEO & Co-Founder

P.S. #H2OWorld was an amazing experience. I invite you to watch the keynote and more than 40 talks and conversations.

It’s all Water (or should I say H2O) to me!

It’s all Water (or should I say H2O) to me!

By Krishna Visvanathan, Co-founder & Partner, Crane Venture Partners

In the career of any venture capitalist, one dreads the “oh shit moment”. For those unfamiliar with this most technical of terms – it is that moment of clarity when a VC, in the immediate aftermath of closing one’s latest investment (often at the first post investment Board meeting), is brought back down to earth with the realisation that the shiny new investment wasn’t quite so shiny after all.

Whilst it’s not the case for every investment of course (exceptions proving the rule and all that), it was still with slight trepidation that I set off for Mountain View, CA on Dec 4th, to attend H2O World and to connect with the Board – to see what customers, ecosystems partners and Board members really thought of the company – just weeks after the completion of H2O.ai’s $40m Series C, in which Crane participated as the sole European investor.

I suspect SriSatish Ambati, Co-Founder & CEO, would probably have not asked me to pen my reflections of my first H2O World, had he known this – but you can relax Sri, I can genuinely say that H2O is one of those exceptions. What I experienced at H2O World surpassed my expectations.

Impressive attendance levels – I was staggered to see over 500 people attending when NIPS was taking place in LA at the very same time. Also pleasing was the number of attendees representing enterprise users already deploying AI (& H2O) for practical use cases to great impact (more on this later).

Open source is not a marketing strategy, it’s a way of life – and when coupled with great product is when users, partners and customers become the best evangelisers and the ecosystem takes on a life of its own. This was perhaps the enduring memory of the conference for me – the vibrancy, zeal, depth and richness of the H2O community. This is the first enterprise startup I’ve been involved in with such a highly developed community – vital to succeeding in open source (alongside a sizeable TAM). Seeing and speaking with H2O’s users on stage, in the coffee areas, at the demo tables eulogising about their uses of H2O’s products filled me with a truly warm glow.

The Data and AI revolution will not be televised – to paraphrase Gil Scott Heron. It’s here, it’s now but it’s only just beginning and it will truly transform every facet of human and corporate existence – from predicting blood usage and thus saving $m’s of dollars but more importantly saving precious blood bags by dramatically reducing wastage (shout out to Prof. Rob Tibshirani from Stanford), to predicting, detecting and combating fraudsters at PayPal to predicting Sepsis to save lives at a healthcare provider or credit scoring/lending, AML, KYC and much more at one of the largest credit card companies, these are just the tip of the use case iceberg of H2O and AI. We learn every day of new users and new use cases from the growing community of over 12,600 enterprises (across many verticals – Finance & Insurance, Healthcare, Automotive, TMT, Retail to name a few) and 130,000 users – whether you are a startup or a Fortune 1000 enterprise, if AI is not already a part of your corporate vernacular then good luck!

Software is eating the world, AI is eating Software but Data is feeding both. When Jeff Herbst of Nvidia said on stage at H2O World – “The next phase of the AI revolution is all about Data”, it was music to our ears at Crane as we’ve been investing in Data & AI companies for a couple of years now. Data is the true value and helping enterprises unlock the gold in their data is what Driverless AI (DAI) is all about. Whilst I was fully aware of the potential of DAI, hearing PayPal describe how DAI produced an optimised feature rich model/recipe an order of magnitude quicker than traditional modelling practices without DAI was simultaneously mind blowing and illustrative of the untapped potential. The floodgates will truly open when DAI enables any user to BYOR – bring your own recipe and to share these recipes with the community.

Interpretability and Visualisation of Data in AI is the another key plank and yet again, H2O is taking the lead. Even for a non-techie, Professor Leland Wilkinson’s illustration of the visualisation capabilities he and H2O have built made me sit up and take notice.

“Democratising AI is not a mission, it’s a duty”SriSatish Ambati. We are only at the start of the AI revolution but battle lines are already being drawn between the giants, deploying huge resources, stockpiling talent, building proprietary hardware/infrastructure/platforms, all in the name of harnessing AI and data for their own benefits. AI will transform human existence in profound ways that we are yet to imagine and making it accessible and executable by the many must therefore be our duty. Whilst we make no apology for our investment in H2O being about generating a financial return for our investors, we are also firmly and proudly committed to H2O’s doctrine of democratising AI.

Unlike Kevin Costner’s, this Waterworld was truly epic. Plaudits go to the entire H2O team for putting on a great event but more so for creating and nurturing such a superb community and for building world class product wrapped in unparalleled customer-centricity. We at Crane are clearly biased but I think you can tell that we are super excited to be part of the H2O team and hope to contribute in some small way to their continued success.

For other blog posts from Crane, please check out Crane-Taking Flight.

H2O4GPU Hands-On Lab (Video) + Updates

Deep learning algorithms have benefited significantly from the recent performance gains of GPUs. However, it has been uncertain whether GPUs can speed up powerful classical machine learning algorithms such as generalized linear modeling, random forests, gradient boosting machines, clustering, and singular value decomposition.

Today I’d love to share another interesting presentation from #H2OWorld focused on H2O4GPU.

H2O4GPU is a GPU-optimized machine learning library with a Python scikit-learn API tailored for enterprise AI. The library includes all the CPU algorithms from scikit-learn and also has selected algorithms that benefit greatly from GPU acceleration.

In the video below, Jon McKinney, Director of Research at H2O.ai, discussed the GPU-optimized machine learning algorithms in H2O4GPU and showed their speed in a suite of benchmarks against scikit-learn run on CPUs.

A few recent benchmarks include:

We’re always receiving helpful feedback from the community and making updates.

Exciting updates to expect in Q1 2018 include:

  • Aggregator
  • DBSCAN
  • Kalman Filters
  • K-nearest neighbors
  • Quantiles
  • Sort

If you’d like to learn more about H2O4GPU, I invite you to explore these helpful links:

Happy Holidays!

Rosalie

Thank You for an Incredible H2O World

Thank You for an Incredible H2O World

#H2OWorld 2017 was an incredible experience!

It was wonderful to gather with community members from all over the world for more than 50 interesting presentations and so many great conversations.

H2O World kicked off at the Computer History Museum with a keynote by H2O.ai CEO and Co-Founder, Sri Ambati, on the Maryam-Curie stage.

Sri’s keynote was followed by more than 20 presentations on the first day from community members at innovative organizations like BeeswaxIO, Business Science, Change Healthcare, Comcast, Equifax, NVIDIA, PayPal, Stanford University, Wildbook and many others.

The second day started with a keynote from Professor Rob Tibshirani focused on “An Application of the Lasso in Biomedical data sciences”.

Professor Tibshirani’s keynote was followed by more than 25 presentations from leading organizations including Amazon’s A9, Booking.com, Capital One, Digitalist Group, IBM, MapD, NVIDIA, QQ Trend, Stanford Medicine and more.

I’d love to say thank you to everyone who joined us at H2O World. We are incredibly grateful for your continued encouragement and feedback.

Thank you also to our talented team and Shiloh Events for planning such an amazing event.

Looking forward to the next H2O World!

Happy Holidays,

Rosalie

Director of Community

P.S. Want to share how you’re using H2O.ai products? I’d be thrilled to hear from you! Drop me a note.

Driverless AI – Introduction, Hands-On Lab and Updates

Driverless AI – Introduction, Hands-On Lab and Updates

#H2OWorld was an incredible experience. Thank you to everyone who joined us!

There were so many fascinating conversations and interesting presentations. I’d love to invite you to enjoy the presentations by visiting our YouTube channel.

Over the next few weeks, we’ll be highlighting many of the talks. Today I’m excited to share two presentations focused on Driverless AI – “Introduction and a Look Under the Hood + Hands-On Lab” and “Hands-On Focused on Machine Learning Interpretability”.

Slides available here.

Slides available here.

The response to Driverless AI has been amazing. We’re constantly receiving helpful feedback and making updates.

A few recent updates include:

Version 1.0.11 (December 12 2017)
– Faster multi-GPU training, especially for small data
– Increase default amount of exploration of genetic algorithm for systems with fewer than 4 GPUs
– Improved accuracy of generalization performance estimate for models on small data (< 100k rows)
– Faster abort of experiment
– Improved final ensemble meta-learner
– More robust date parsing

Version 1.0.10 (December 4 2017)
– Tooltips and link to documentation in parameter settings screen
– Faster training for multi-class problems with > 5 classes
– Experiment summary displayed in GUI after experiment finishes
– Python Client Library downloadable from the GUI
– Speedup for Maxwell-based GPUs
– Support for multinomial AUC and Gini scorers
– Add MCC and F1 scorers for binomial and multinomial problems
– Faster abort of experiment

Version 1.0.9 (November 29 2017)
– Support for time column for causal train/validation splits in time-series datasets
– Automatic detection of the time column from temporal correlations in data
– MLI improvements, dedicated page, selection of datasets and models
– Improved final ensemble meta-learner
– Test set score now displayed in experiment listing
– Original response is preserved in exported datasets
– Various bug fixes

Additional release notes can be viewed here:
http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/release_notes.html

If you’d like to learn more about Driverless AI, feel free to explore these helpful links:
– Driverless AI User Guide: http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/index.html
– Driverless AI Webinars: https://webinar.com/channel/4a90aa11b48f4a5d8823ec924e7bd8cf
– Latest Driverless AI Docker Download: https://www.h2o.ai/driverless-ai-download/
– Latest Driverless AI AWS AMI: Search for AMI-id : ami-d8c3b4a2
– Stack Overflow: https://stackoverflow.com/questions/tagged/driverless-ai

Want to try Driverless AI? Send us a note.

New versions of H2O-3 and Sparkling Water available

Dear H2O Community,

#H2OWorld is on Monday and we can’t wait to see you there! We’ll also be live streaming the event starting at 9:25am PST. Explore the agenda here.

Today we’re excited to share that new versions of H2O-3 and Sparkling Water are available.

We invite you to download them here:
https://www.h2o.ai/download/

H2O-3.16
– MOJOs are now supported for Stacked Ensembles.
– Easily specify the meta-learner algorithm type that Stacked Ensemble should use. This can be AUTO, GLM, GBM, DRF or Deep Learning.
– GBM, DRF now support custom evaluation metrics.
– The AutoML leaderboard now uses cross-validation metrics (new default).
– Multiclass stacking is now supported in AutoML. Removed the check that caused AutoML to skip stacking for multiclass.
– The Aggregator Function is now exposed in the Python/R client.
– Support for Python 3.6.

Detailed changes and bug fixes can be found here:
https://github.com/h2oai/h2o-3/blob/master/Changes.md

Sparkling Water 2.0, 2.1, 2.2
– Support for H2O Models into Spark python pipelines.
– Improved handling of sparse vectors in internal cluster.
– Improved stability of external cluster deployment mode.
– Includes latest H2O-3.16.0.2.

Detailed changes and bug fixes can be explored here:
2.2 – https://github.com/h2oai/sparkling-water/blob/rel-2.2/doc/CHANGELOG.rst
2.1 – https://github.com/h2oai/sparkling-water/blob/rel-2.1/doc/CHANGELOG.rst
2.0 – https://github.com/h2oai/sparkling-water/blob/rel-2.0/doc/CHANGELOG.rst

Hope to see you on Monday!

The H2O.ai Team

H2O.ai Raises $40 Million to Democratize Artificial Intelligence for the Enterprise

Driverless AI


Series C round led by Wells Fargo and NVIDIA

MOUNTAIN VIEW, CA – November 30, 2017 – H2O.ai, the leading company bringing AI to enterprises, today announced it has completed a $40 million Series C round of funding led by Wells Fargo and NVIDIA with participation from New York Life, Crane Venture Partners, Nexus Venture Partners and Transamerica Ventures, the corporate venture capital fund of Transamerica and Aegon Group. The Series C round brings H2O.ai’s total amount of funding raised to $75 million. The new investment will be used to further democratize advanced machine learning and for global expansion and innovation of Driverless AI, an automated machine learning and pipelining platform that uses “AI to do AI.”

H2O.ai continued its juggernaut growth in 2017 as evidenced by new platforms and partnerships. The company launched Driverless AI, a product that automates AI for non-technical users and introduces visualization and interpretability features that explain the data modeling results in plain English, thus fostering further adoption and trust in artificial intelligence.

H2O.ai has partnered with NVIDIA to democratize machine learning on the NVIDIA GPU compute platform. It has also partnered with IBM, Amazon AWS and Microsoft Azure to bring its best-in-class machine learning platform to other infrastructures and the public cloud.

H2O.ai co-founded the GPU Open Analytics Initiative (GOAI) to create an ecosystem for data developers and researchers to advance data science using GPUs, and has launched H2O4GPU, a collection of the fastest GPU algorithms on the market capable of processing massive amounts of unstructured data up to 40x faster than on traditional CPUs.

“AI is eating both hardware and software,” said Sri Ambati, co-founder and CEO at H2O.ai. “Billions of devices are generating unprecedented amounts of data, which truly calls for distributed machine learning that is ubiquitous and fast. Our focus on automating machine learning makes it easily accessible to large enterprises. Our maker culture fosters deep trust and teamwork with our customers, and our partnerships with vendors across industry verticals bring significant value and growth to our community. It is quite supportive and encouraging to see our partners lead a significant funding round to help H2O.ai deliver on its mission.”

“AI is an incredible force that’s sweeping across the technology landscape,” said Jeff Herbst, vice president of business development at NVIDIA. “H2O.ai is exceptionally well positioned in this field as it pursues its mission to become the world’s leading data science platform for the financial services industry and beyond. Its use of GPU-accelerated AI provides powerful tools for customers, and we look forward to continuing our collaboration with them.”

“It is exhilarating to have backed the H2O.ai journey from day zero: the journey from a PowerPoint to becoming the enterprise AI platform essential for thousands of corporations across the planet,” said Jishnu Bhattarcharjee, managing director at Nexus Venture Partners. “AI has arrived, transforming industries as we know them. Exciting scale ahead for H2O, so fasten your seat belts!”

As the leading open-source platform for machine learning, H2O.ai is leveling the playing field in a space where much of the AI innovation and talent is locked up inside major tech titans and thus inaccessible to other enterprises. This is precisely why over 100,000 data scientists, 12,400 organizations and nearly half of the Fortune 500 have embraced H2O.ai’s suite of products that pack the productivity of an elite data science team into a single solution.

“We are delighted to lead H2O.ai’s funding round. We have been following the company’s progress and have been impressed by its high-caliber management team and success in establishing an open-source machine learning platform with wide adoption across many industries. We are excited to support the next phase of their development,” said Basil Darwish, director of strategic investments at Wells Fargo Securities.

Beyond its open source community, H2O.ai is transforming several industry verticals and building strong customer partnerships. Over the past 18 months, the company has worked with PwC to build PwC’s “GL.ai,” a revolutionary bot that uses AI and machine learning to ‘x-ray’ a business and detect anomalies in the general ledger. The product was named the ‘Audit Innovation of the Year‘ by the International Accounting Bulletin in October 2017.

H2O’s signature community conference, H2O World will take place on December 4-5, 2017 at the Computer History Museum in Mountain View, Calif.

About H2O.ai

H2O.ai’s mission is to democratize machine learning through its leading open source software platform. Its flagship product, H2O.ai empowers enterprise clients to quickly deploy machine learning and predictive analytics to accelerate business transformation for critical applications such as predictive maintenance and operational intelligence. H2O.ai recently launched Driverless AI, the first solution that allows any business — even ones without a team of talented data scientists — to implement AI to solve complex business problems. The product was reviewed and selected as Editor’s Choice in InfoWorld. Customers include Capital One, Progressive Insurance, Comcast, Walgreens and Kaiser Permanente. For more information and to learn more about how H2O.ai is transforming businesses, visit www.h2o.ai.

Contacts

VSC for H2O.ai
Kayla Abbassi
Senior Account Executive
kayla@vscpr.com

Laying a Strong Foundation for Data Science Work

By William Merchan, CSO, DataScience.com

In the past few years, data science has become the cornerstone of enterprise companies’ efforts to understand how to deliver better customer experiences. Even so, when DataScience.com commissioned Forrester to survey over 200 data-driven businesses last year, only 22% reported they were leveraging big data well enough to get ahead of their competition.

That’s because there’s a big difference between building predictive models and putting them into production effectively. Data science teams need the support of IT from the very beginning to ensure that issues with large-scale data management, governance, and access don’t stand in the way of operationalizing key insights about your customers. However, many enterprise companies are still treating IT involvement as an afterthought, which ultimately delays the timeline for seeing value from their data science efforts.

There are many ways that better IT management can help scale the impact of data science at your organization. Three best practices include using containers for data science environments, managing compute resources effectively, and putting work into production faster with the help of tools. Here’s how it’s done.

1. Using software containers is one of the most impactful steps you can take to implement IT management best practices. These standardized development environments ensure that the hard work your data scientists put into building predictive models won’t go to waste when it’s time to deploy their code. Without a container-based workflow, a data scientist starting a new analysis must either wait for IT to build an environment from scratch, or build one themselves using the unique combination of packages and resources they prefer — and waiting for those to install or compile.

There are two major issues associated with both of these approaches: they don’t scale, and they’re slow. When data scientists are individually responsible for configuring environments as needed, their work isn’t reproducible — if it’s used in a different environment, it might not even run. Containers put the power in the hands of IT to standardize environment configuration in advance using images, which are snapshots of containers. Data scientists can launch environments from those images — which have already been vetted by IT — saving a lot of time in the long run.

2. Provide ample computing power to support your data scientists’ analysis from start to finish. Empowering them to spin up compute resources in the cloud as needed ensures they never get held up by limited computing power. It also eliminates the potential additional cost of maintaining unnecessary nodes. The same idea applies to on-prem data centers. IT must carefully monitor the expansion of data science work and scale resources accordingly. It may seem obvious, but IHS Markit reports that companies not anticipating this need lose approximately $700 billion a year to IT downtime.

3. Put data science work into production right away to start seeing its value earlier on. Imagine your data science team has built a recommender system to predict what products a customer is likely to enjoy based on the products he or she has already purchased. Even if you’re satisfied with the model’s accuracy and have identified some unexpected relationships that should inform your targeting strategies, this information still needs to be integrated into your application or website for it to be valuable.

Traditionally, the pipeline that delivers those recommendations to your customers would be built by engineers and require extensive support from IT. The rise of microservices, however, gives data scientists the opportunity to deploy models as APIs that can be integrated directly into an application.

If you’re among the 78% of companies not fully realizing the return on your data science investment, chances are there’s room to improve the IT foundation you’ve laid. To learn more about the next steps, find out how to take an agile approach to data science.

About the Author

William Merchan leads business and corporate development, partner initiatives, and strategy at DataScience.com as chief strategy officer. He most recently served as SVP of Strategic Alliances and GM of Dynamic Pricing at MarketShare, where he oversaw global business development and partner relationships, and successfully led the company to a $450 million acquisition by Neustar.

H2O.ai Releases H2O4GPU, the Fastest Collection of GPU Algorithms on the Market, to Expedite Machine Learning in Python

H2O4GPU is an open-source collection of GPU solvers created by H2O.ai. It builds on the easy-to-use scikit-learn Python API and its well-tested CPU-based algorithms. It can be used as a drop-in replacement for scikit-learn with support for GPUs on selected (and ever-growing) algorithms. H2O4GPU inherits all the existing scikit-learn algorithms and falls back to CPU algorithms when the GPU algorithm does not support an important existing scikit-learn class option. It utilizes the efficient parallelism and high throughput of GPUs. Additionally, GPUs allow the user to complete training and inference much faster than possible on ordinary CPUs.

Today, select algorithms are GPU-enabled. These include Gradient Boosting Machines (GBM’s), Generalized Linear Models (GLM’s), and K-Means Clustering. Using H2O4GPU, users can unlock the power of GPU’s through the scikit-learn API that many already use today. In addition to the scikit-learn Python API, an R API is in development.

Here are specific benchmarks from a recent H2O4GPU test:

  • More than 5X faster on GPUs as compared to CPUs
  • Nearly 10X faster on GPUs
  • More than 40X faster on GPUs

“We’re excited to release these lightning-fast H2O4GPU algorithms and continue H2O.ai’s foray into GPU innovation,” said Sri Ambati, co-founder and CEO of H2O.ai. “H2O4GPU democratizes industry-leading speed, accuracy and interpretability for scikit-learn users from all over the globe. This includes enterprise AI users who were previously too busy building models to have time for what really matters: generating revenue.”

“The release of H2O4GPU is an important milestone,” said Jim McHugh, general manager and vice president at NVIDIA. “Delivered as part of an open-source platform it brings the incredible power of acceleration provided by NVIDIA GPUs to widely-used machine learning algorithms that today’s data scientists have come to rely upon.”

H2O4GPU’s release follows the launch of Driverless AI, H2O.ai’s fully automated solution that handles data science operations — data preparation, algorithms, model deployment and more — for any business needing world-class AI capability in a single product. Built by top-ranking Kaggle Grandmasters, Driverless AI is essentially an entire data science team baked into one application.

Following is some information on each GPU enabled algorithm as well as a roadmap.

Gradient Linear Model (GLM)

  • Framework utilizes Proximal Graph Solver (POGS)
  • Solvers include Lasso, Ridge Regression, Logistic Regression, and Elastic Net Regularization
  • Improvements to original implementation of POGS:
    • Full alpha search
    • Cross Validation
    • Early Stopping
    • Added scikit-learn-like API
    • Supports multiple GPU’s

Gradient Linear Model (GLM)

Gradient Boosting Machines (Please check out Rory’s blog on Nvidia Dev Blogs for a more detailed write-up on Gradient Boosted Trees on GPUs)

  • Based on XGBoost
  • Raw floating point data — binned into quantiles
  • Quantiles are stored as compressed instead of floats
  • Compressed quantiles are efficiently transferred to GPU
  • Sparsity is handled directly with high GPU efficiency
  • Multi-GPU enabled by sharing rows using NVIDIA NCCL AllReduce

Gradient Boosting Machines

k-Means Clustering

  • Based on NVIDIA prototype of k-Means algorithm in CUDA
  • Improvements to original implementation:
    • Significantly faster than scikit-learn implementation (50x) and other GPU implementations (5-10x)
    • Supports multiple GPUs

k-Means Clustering

H2O4GPU combines the power of GPU acceleration with H2O’s parallel implementation of popular algorithms, taking computational performance levels to new heights.

To learn more about H2O4GPU click here and for more information about the math behind each algorithm, click here.

Driverless AI Blog

In today’s market, there aren’t enough data scientists to satisfy the growing demand for people in the field. With many companies moving towards automating processes across their businesses (everything from HR to Marketing), companies are forced to compete for the best data science talent to meet their needs. A report by McKinsey says that based on 2018 job market predictions: “The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data.” H2O’s Driverless AI addresses this gap by democratizing data science and making it accessible to non-experts, while simultaneously increasing the efficiency of expert data scientists. Its point-and-click UI minimizes the complicated legwork that precedes the actual model build.

Driverless AI is designed to take a raw dataset and run it through a proprietary algorithm that automates the data exploration/feature engineering process, which typically takes ~80% of a data scientist’s time. It then auto-tunes model parameters and provides the user with the model that yields the best results. Therefore, experienced data scientists are spending far less time engineering new features and can focus on drawing actionable insights from the models Driverless AI builds. Lastly, the user can see visualizations generated by the Machine Learning Interpretability (MLI) component of Driverless AI to clarify the model results and the effect of changing variables’ values. The MLI feature eliminates the black box nature of machine learning models and provides clear and straightforward results from a model as well as how changing features will alter results.

Driverless AI is also GPU-enabled, which can result in up to 40x speed ups. We had demonstrated GPU acceleration to achieve those speedups for machine learning algorithms at GTC in May 2017. We’ve ported over XGBoost, GLM, K-Means and other algorithms to GPUs to achieve significant performance gains. This enable Driverless AI to run thousands of iterations to find the most accurate feature transforms and models.

The automatic nature of Driverless AI leads to increased accuracy. AutoDL engineers new features mechanically, and AutoML finds the right algorithms and tunes them to create the perfect ensemble of models. You can think of it as a Kaggle Grandmaster in a box. To demonstrate the power of Driverless AI, we participated in a bunch of Kaggle contests and the results are here below. Driverless AI out of the box got performed nearly as well as the best Kaggle Grandmasters

Let’s look at an example: we are going to work with a credit card dataset and predict whether or not a person is going to default on their payment next month based on a set of variables related to their payment history. After simply choosing the variable we are predicting for as well as the number of iterations we’d like to run, we launch our experiment.

As the experiment cycles through iterations, it creates a variable importance chart ranking existing and newly created features by their effect on the model’s accuracy.

In this example, AutoDL creates a feature that represents the cross validation target encoding of the variables sex and education. In other words, if we group everyone who is of the same sex and who has the same level of education in this dataset, the resulting feature would help in predicting whether or not the customer is going to default on their payment next month. Generating features like this one usually takes the majority of a data scientist’s time, but Driverless AI automates this process for the user.

After AutoDL generates new features, we run the updated dataset through AutoML. At this point, Driverless AI builds a series of models using various algorithms and delivers a leaderboard ranking the success of each model. The user can then inspect and choose the model that best fits their needs.

Lastly, we can use the Machine Learning Interpretability feature to get clear and concise explanations of our model results. Four dynamic graphs are generated automatically: KLime, Variable Importance, Decision Tree Chart, and Partial Dependence Plot. Each one helps the user explore the model output more closely. KLIME creates one global surrogate GLM on the entire training data and also creates numerous local surrogate GLMs on samples formed from K-Means clusters in the training data. All penalized GLM surrogates are trained to model the predictions of the Driverless AI model. The Variable Importance measures the effect that a variable has on the predictions of a model, while the Partial Dependence Plot shows the effect of changing one variable on the outcome. The Decision Tree Surrogate Model clears up the Driverless AI model by displaying an approximate flow-chart of the complex Driverless AI model’s decision making process. The Decision Tree Surrogate Model also displays the most important variables in the Driverless AI model and the most important interactions in the Driverless AI model. Lastly, the Explanations button gives the user a plain English sentence about how each variable effects the model.

All of these graphs can be used to visualize and debug the Driverless AI model by comparing the displayed decision-process, important variables, and important interactions to known standards, domain knowledge, and reasonable expectations.

Driverless AI streamlines the machine learning workflow for inexperienced and expert users alike. For more information, click here.