H2O + TensorFlow on AWS GPU

TensorFlow on AWS GPU instance
In this tutorial, we show how to setup TensorFlow on AWS GPU instance and run H2O Tensorflow Deep learning demo.

To get started, request an AWS EC2 instance with GPU support. We used a single g2.2xlarge instance running Ubuntu 14.04.To setup TensorFlow with GPU support, following softwares should be installed:

  1. Java 1.8
  2. Python pip
  3. Unzip utility
  4. CUDA Toolkit (>= v7.0)
  5. cuDNN (v4.0)
  6. Bazel (>= v0.2)
  7. TensorFlow (v0.9)

To run H2O Tensorflow Deep learning demo, following softwares should be installed:

  1. IPython notebook
  2. Scala
  3. Spark
  4. Sparkling water

Software Installation:

#To install Java follow below steps: Type ‘Y’ on installation prompt
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update 
sudo apt-get install oracle-java8-installer
Update JAVA_HOME in ~/.bashrc 

export PATH=$PATH:$JAVA_HOME/bin 

# Execute following command to update current session: 
source ~/.bashrc 

#Verify version and path: 
java -version 


#AWS EC2 instance has Python installed by default. Verify if Python 2.7 is installed already:
python -V 

#Install pip 
sudo apt-get install python-pip 

#Install IPython notebook 
sudo pip install "ipython[notebook]" 

#To run H2O example notebooks, execute following commands: 
sudo pip install requests 
sudo pip install tabulate 

Unzip utility:

#Execute following command to install unzip
sudo apt-get install unzip


#Follow below mentioned steps: Type ‘Y’ on installation prompt
sudo apt-get install scala 

#Update SCALA_HOME in ~/.bashrc and execute following command to update current session: 
source ~/.bashrc 

#Verify version and path: 
scala -version 


#Java and Scala should be installed before installing Spark. 
#Get latest version of Spark binary:
wget http://apache.cs.utah.edu/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz

#Extract the file: 
tar xvzf spark-1.6.1-bin-hadoop2.6.tgz 

#Update SPARK_HOME in ~/.bashrc and execute following command to update current session: 
source ~/.bashrc 


#Verify the variables: 

Sparkling Water:

#Latest Spark pre-built for Hadoop should be installed and point SPARK_HOME to it:
export SPARK_HOME="/path/to/spark/installation"

#To launch a local Spark cluster with 3 worker nodes with 2 cores and 1g per node, export MASTER variable
export MASTER="local-cluster[3,2,1024]"

#Download and run Sparkling Water
wget http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.6/5/sparkling-water-1.6.5.zip
unzip sparkling-water-1.6.5.zip
cd sparkling-water-1.6.5
bin/sparkling-shell --conf "spark.executor.memory=1g"

CUDA Toolkit:

#In order to build or run TensorFlow with GPU support, both NVIDIA’s Cuda Toolkit (>= 7.0) and cuDNN (>= v2) need to be installed. 
#To install CUDA toolkit, run:
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1410/x86_64/cuda-repo-ubuntu1410_7.0-28_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1410_7.0-28_amd64.deb
sudo apt-get update
sudo apt-get install cuda


#To install cuDNN, download a file named cudnn-7.0-linux-x64-v4.0-prod.tgz after filling NVIDIA questionnaire. 
#You need to transfer it to your EC2 instance’s home directory.
tar -zxf cudnn-7.0-linux-x64-v4.0-prod.tgz &&
rm cudnn-7.0-linux-x64-v4.0-prod.tgz
sudo cp -R cuda/lib64 /usr/local/cuda/lib64 
sudo cp ~/cuda/include/cudnn.h /usr/local/cuda

#Reboot the system 
sudo reboot

#Update environment variables as shown below:
export CUDA_HOME=/usr/local/cuda 
export CUDA_ROOT=/usr/local/cuda 
export PATH=$PATH:$CUDA_ROOT/bin 


#To instal Bazel(>= v0.2), run:
sudo apt-get install pkg-config zip g++ zlib1g-dev
wget https://github.com/bazelbuild/bazel/releases/download/0.3.0/bazel-0.3.0-installer-linux-x86_64.sh
chmod +x bazel-0.3.0-installer-linux-x86_64.sh
./bazel-0.3.0-installer-linux-x86_64.sh --user


#Download and install TensorFlow:
wget https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.9.0rc0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade tensorflow-0.9.0rc0-cp27-none-linux_x86_64.whl 

#Configure TF with GPU support enabled using: 

To build TensorFlow, run:

bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
sudo pip install --upgrade /tmp/tensorflow_pkg/tensorflow-0.8.0-py2-none-any.whl

Run H2O Tensorflow Deep learning demo:

#Since, we want to open IPython notebook remotely, we will use IP and port option. To start TensorFlow notebook:
cd sparkling-water-1.6.5/ 
IPYTHON_OPTS="notebook --no-browser --ip='*' --port=54321" bin/pysparkling #Note that port specified in above command should be open in the system. Open http://PublicIP:8888 in browser to start IPython notebook console. Click on TensorFlowDeepLearning.ipynb Refer this video for demo details. #Sample .bashrc contents: export JAVA_HOME=/usr/lib/jvm/java-8-oracle export SCALA_HOME=/usr/share/java export SPARK_HOME=/home/ubuntu/spark-1.6.1-bin-hadoop2.6 export MASTER="local-cluster[3,2,1024]" export PATH=$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin export CUDA_HOME=/usr/local/cuda export CUDA_ROOT=/usr/local/cuda export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/jvm/java-8-oracle/bin:/home/ubuntu/spark-1.6.1-bin-hadoop2.6/bin:/home/ubuntu/spark-1.6.1-bin-hadoop2.6/sbin:/usr/local/cuda/bin:/home/ubuntu/bin export LD_LIBRARY_PATH=:/usr/local/cuda/lib64

1) ERROR: Getting java.net.UnknownHostException while starting spark-shell
Make sure /etc/hosts has entry for hostname.
Eg: hostname

2) ERROR: Getting Could not find .egg-info directory in install record error during IPython installation

sudo pip install --upgrade setuptools pip

3) ERROR: Can’t find swig while configuring TF

sudo apt-get install swig

4) ERROR: “Ignoring gpu device (device: 0, name: GRID K520, pci bus id: 0000:00:03.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5”
Specify 3.0 while configuring TF at:
Please note that each additional compute capability significantly increases your build time and binary size.

5) ERROR: Could not insert ‘nvidia_352’: Unknown symbol in module, or unknown parameter (see dmesg)

sudo apt-get install linux-image-extra-virtual

6) ERROR: Cannot find ’./util/python/python_include

sudo apt-get install python-dev

7) Find Public IP address of system


Demo Videos