This post is co-authored by Animesh Singh, Nicholas Png, Tommy Li, and Vinod Iyengar.
Deep learning frameworks like TensorFlow, PyTorch, Caffe, MXNet, and Chainer have reduced the effort and skills needed to train and use deep learning models. But for AI developers and data scientists, it’s still a challenge to set up and use these frameworks in a consistent manner for distributed model training and serving.
The open source Fabric for Deep Learning (FfDL) project provides a consistent way for AI developers and data scientists to use deep learning as a service on Kubernetes and to use Jupyter notebooks to execute distributed deep learning training for models written with these multiple frameworks.
Now, FfDL is announcing a new addition that brings together that deep learning training capability with state-of-the-art machine learning methods.
Augment deep learning with best-of-breed machine learning capabilities
For anyone who wants to try machine learning algorithms with FfDL, we are excited to introduce H2O.ai as the newest member of the FfDL stack. H2O-3 is H2O.ai’s open source platform, an in-memory, distributed, and scalable machine learning and predictive analytics platform that enables you to build machine learning models on big data. H2O-3 offers an expansive library of algorithms, such as Distributed Random Forests, XGBoost, and Stacked Ensembles, as well as AutoML, a powerful tool for users with less experience in data science and machine learning.
After data cleansing, or “munging,” one of the most fundamental parts of training a powerful and predictive model is properly tuning the model. For example, deep neural networks are notoriously difficult for a non-expert to tune properly. This is where AutoML becomes an extremely valuable tool. It provides an intuitive interface that automates the process of training a large number of candidate models and selecting the highest performing model based on the user’s preferred scoring method.
In combination with FfDL, H2O-3 makes data science highly accessible to users of all levels of experience. You can simply deploy FfDL to your Kubernetes cluster and submit a training job to FfDL. Behind the scenes, FfDL sets up the H2O-3 environment, runs your training job, and streams the training logs for you to monitor and debug your model. Since FfDL also supports multi-node clusters with H2O-3, you can horizontally scale your H2O-3 training job seamlessly on all your Kubernetes nodes. When model training is complete, you can save your model locally to FfDL or to a cloud object store, where it can be obtained later for serving inference.
Try H2O on FfDL today!
You can find the details on how to train H2O models on FfDL in the open source FfDL readme file and guide. Deploy, use, and extend them with any of the capabilities that you find helpful. We’re waiting for your feedback and pull requests!
Patients, physicians, nurses, health administrators and policymakers are beneficiaries of the rapid transformations in health and life sciences. These transformations are being driven by new discoveries (etiology, therapies, and drugs/implants), market reconfiguration and consolidation, a movement to value-based care, and access/affordability considerations. The people and systems that are driving these changes are generating new engagement models, workflows, data, and most importantly, new needs for all participants in the care continuum.
Analytics 1.0 (driven by business intelligence & reporting) for Healthcare as we describe in our book is inadequate to address these transformations. A retrospective understanding of “what happened?” is limited in its usefulness as it only provides for corrective action – usually driven by resource availability. To improve wellness, care outcomes, clinician satisfaction, and patient quality of life, we ought to be leveraging little and big data via Analytics 2.0 & 3.0. This journey will require leveraging machine/deep learning and other AI methods to separate signal from noise, integrate insights into a workflow, address data fidelity, and develop contextually-intelligent agents.
Automating machine learning and deep learning simplifies access to these advanced technologies by the Humans of Healthcare. They are key pre-requisites to create a data-driven, learning Healthcare organization. The net results – better science, improved access & affordability, and evidence-based wellness/care.
Among others involved in the care continuum, physicians are at the forefront of the coming health sciences revolution. Join our expert, all-physician panel at the H2O offices in Mountain View, CA to hear their expert thoughts and interact with them. Our panel consists of 3 leading physician leaders who are also driving clinical innovations using AI in their specialties & organizations:
Dr. Baber Ghauri, Physician Executive and Healthcare Innovator, Trinity Health
Dr. Esther Yu, Professor & Neuroradiologist, UCSF
Dr. Pratik Mukherjee, Professor, and Director of CIND, San Francisco VA
Moderator: Prashant Natarajan, Sr. Dir. AI Apps at H2O.ai and best-selling author/contributor to books on medical informatics & analytics
Your intelligence, support and love have been the strength behind an incredible year of growth, product innovation, partnerships, investments and customer wins for H2O and AI in 2017. Thank you for answering our rallying call to democratize AI with our maker culture.
Our mission to make AI ubiquitous is still fresh as dawn and our creativity new as spring. We are only getting started, learning, rising from each fall. H2O and Driverless AI are just the beginnings.
As we look into 2018, we see prolific innovation to make AI accessible to everyone. Simplicity that opens scale. Our focus on making experiments faster, easier and cheaper. We are so happy that you will be the center of our journey. We look forward to delivering many more magical customer experiences.
On behalf of the team and management at H2O, I wish you all a wonderful holiday: deep meaningful time spent with yourself and your loved ones and to come back refreshed for a winning 2018!
Gratitude for your partnership in our beautiful journey – it’s just begun!
To get started, request an AWS EC2 instance with GPU support. We used a single g2.2xlarge instance running Ubuntu 14.04.To setup TensorFlow with GPU support, following softwares should be installed:
#To install Java follow below steps: Type ‘Y’ on installation prompt
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
Update JAVA_HOME in ~/.bashrc
#Add JAVA_HOME to PATH:
# Execute following command to update current session:
#Verify version and path:
#AWS EC2 instance has Python installed by default. Verify if Python 2.7 is installed already:
sudo apt-get install python-pip
#Install IPython notebook
sudo pip install "ipython[notebook]"
#To run H2O example notebooks, execute following commands:
sudo pip install requests
sudo pip install tabulate
#Execute following command to install unzip
sudo apt-get install unzip
#Follow below mentioned steps: Type ‘Y’ on installation prompt
sudo apt-get install scala
#Update SCALA_HOME in ~/.bashrc and execute following command to update current session:
#Verify version and path:
#Java and Scala should be installed before installing Spark.
#Get latest version of Spark binary:
#Extract the file:
tar xvzf spark-1.6.1-bin-hadoop2.6.tgz
#Update SPARK_HOME in ~/.bashrc and execute following command to update current session:
#Add SPARK_HOME to PATH:
#Verify the variables:
#Latest Spark pre-built for Hadoop should be installed and point SPARK_HOME to it:
#To launch a local Spark cluster with 3 worker nodes with 2 cores and 1g per node, export MASTER variable
#Download and run Sparkling Water
bin/sparkling-shell --conf "spark.executor.memory=1g"
#In order to build or run TensorFlow with GPU support, both NVIDIA’s Cuda Toolkit (>= 7.0) and cuDNN (>= v2) need to be installed.
#To install CUDA toolkit, run:
sudo dpkg -i cuda-repo-ubuntu1410_7.0-28_amd64.deb
sudo apt-get update
sudo apt-get install cuda
#To install cuDNN, download a file named cudnn-7.0-linux-x64-v4.0-prod.tgz after filling NVIDIA questionnaire.
#You need to transfer it to your EC2 instance’s home directory.
tar -zxf cudnn-7.0-linux-x64-v4.0-prod.tgz &&
sudo cp -R cuda/lib64 /usr/local/cuda/lib64
sudo cp ~/cuda/include/cudnn.h /usr/local/cuda
#Reboot the system
#Update environment variables as shown below:
#Since, we want to open IPython notebook remotely, we will use IP and port option. To start TensorFlow notebook:
IPYTHON_OPTS="notebook --no-browser --ip='*' --port=54321" bin/pysparkling
#Note that port specified in above command should be open in the system.
Open http://PublicIP:8888 in browser to start IPython notebook console.
Click on TensorFlowDeepLearning.ipynb
Refer this video for demo details.
#Sample .bashrc contents:
1) ERROR: Getting java.net.UnknownHostException while starting spark-shell
Make sure /etc/hosts has entry for hostname.
Eg: 127.0.0.1 hostname
2) ERROR: Getting Could not find .egg-info directory in install record error during IPython installation
sudo pip install --upgrade setuptools pip
3) ERROR: Can’t find swig while configuring TF
sudo apt-get install swig
4) ERROR: “Ignoring gpu device (device: 0, name: GRID K520, pci bus id: 0000:00:03.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5”
Specify 3.0 while configuring TF at:
Please note that each additional compute capability significantly increases your build time and binary size.
5) ERROR: Could not insert ‘nvidia_352’: Unknown symbol in module, or unknown parameter (see dmesg)