How to Frame Your Business Problem for Automatic Machine Learning

Over the last several years, machine learning has become an integral part of many organizations’ decision-making at various levels. With not enough data scientists to fill the increasing demand for data-driven business processes, H2O.ai has developed a product called Driverless AI that automates several time consuming aspects of a typical data science workflow: data visualization, feature engineering, predictive modeling, and model explanation. In this post, I will describe Driverless AI, how you can properly frame your business problem to get the most out of this automatic machine learning product, and how automatic machine learning is used to create business value.

What is Driverless AI and what kind of business problems does it solve?

H2O Driverless AI is a high-performance, GPU-enabled computing platform for automatic development and rapid deployment of state-of-the-art predictive analytics models. It reads tabular data from plain text sources, Hadoop, or S3 buckets and automates data visualization and building predictive models. Driverless AI is currently targeting business applications like loss-given-default, probability of default, customer churn, campaign response, fraud detection, anti-money-laundering, demand forecasting, and predictive asset maintenance models. (Or in machine learning parlance: common regression, binomial classification, and multinomial classification problems.)

How do you frame business problems in a data set for Driverless AI?

The data that is read into Driverless AI must contain one entity per row, like a customer, patient, piece of equipment, or financial transaction. That row must also contain information about what you will be trying to predict using similar data in the future, like whether that customer in the row of data used a promotion, whether that patient was readmitted to the hospital within thirty days of being released, whether that piece of equipment required maintenance, or whether that financial transaction was fraudulent. (In data science speak, Driverless AI requires “labeled” data.) Driverless AI runs through your data many, many times looking for interactions, insights, and business drivers of the phenomenon described by the provided data set. Driverless AI can handle simple data quality problems, but it currently requires all data for a single predictive model to be in the same data set and that data set must have already undergone standard ETL, cleaning, and normalization routines before being loaded into Driverless AI.

How do you use Driverless AI results to create commercial value?

Commercial value is generated by Driverless AI in a few ways.

● Driverless AI empowers data scientists or data analysts to work on projects faster and more efficiently by using automation and state-of-the-art computing power to accomplish tasks in just minutes or hours that can take humans months.

● Like in many other industries, automation leads to standardization of business processes, enforces best practices, and eventually drives down the cost of delivering the final product – in this case a predictive model.

● Driverless AI makes deploying predictive models easy – typically a difficult step in the data science process. In large organizations, value from predictive modeling is typically realized when a predictive model is moved from a data analysts’ or data scientists’ development environment into a production deployment setting where the model is running on live data, making decisions quickly and automatically that make or save money. Driverless AI provides both Java- and Python-based technologies to make production deployment simpler.

Moreover, the system was designed with interpretability and transparency in mind. Every prediction made by a Driverless AI model can be explained to business users, so the system is viable even for regulated industries.

Customer success stories with Driverless AI

PayPal tried Driverless AI on a collusion fraud use case and found that simply running on a laptop for 2 hours, Driverless AI yielded impressive fraud detection accuracy, and running on GPU-enhanced hardware, it was able to produce the same accuracy in just 20 minutes. The Driverless AI model was more accurate than PayPal’s existing predictive model and the Driverless AI system found the same insights in their data that their data scientists did! The system also found new features in their data that had not been used before for predictive modeling. For more information about the PayPal use case, click here

G5, a real estate marketing optimization firm, uses Driverless AI in their Intelligent Marketing Cloud to assist clients in targeted marketing spending for property management. Empowered by Driverless AI technology, marketers can quickly prioritize and convert highly qualified inbound leads from G5’s Intelligent Marketing Cloud platform with 95 percent accuracy for serious purchase intent. To learn more about how G5 uses Driverless AI check out:
https://www.h2o.ai/g5-h2o-ai-partner-to-deliver-ai-optimization-for-real-estate-marketing/

How can you try Driverless AI?

Visit: https://www.h2o.ai/driverless-ai/ and download your free 21-day evaluation copy.

We are happy to help you get started installing and using Driverless AI, and here are some resources we’ve put together to enable in that process:

● Installing Driverless AI: https://www.youtube.com/watch?v=swrqej9tFcU

● Launching an Experiment with Driverless AI: https://www.youtube.com/watch?v=bw6CbZu0dKk

● Driverless AI Webinars: https://www.gotostage.com/channel/4a90aa11b48f4a5d8823ec924e7bd8cf

● Driverless AI Documentation: http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/index.html

  • You said “Every prediction made by a Driverless AI model can be explained to business users, so the system is viable even for regulated industries.” — The apparent benefits to business users who use the model is something that you might consider explaining in more detail with a follow-on post. Describe how the predictions are actionable for regulated industry use cases.

    • Patrick Hall

      Hi David – sorry just now came across this response. Among other things, the system can generate monotonic models, it can generate the values needed to create reason codes (or turn down codes, or adverse action codes) both in the GUI and at scoring time in an outside application, and it also generates many of the charts and metrics required for detailed model documentation. Of course, we are always looking to improve, refine, and automate these features in the future. For more info about the exact techniques we use to make our models more interpretable, please see: https://github.com/jphall663/jsm_2018_slides/blob/master/main.pdf or follow up here with additional questions.

      • Those slides are very insightful. Thank you for sharing, Patrick.