Why We Bought A Happy Diwali Billboard


It’s been a dark year in many ways, so we wanted to lighten things up and celebrate Diwali — the festival of lights!

Diwali is a holiday that celebrates joy, hope, knowledge and all that is full of light — the perfect antidote for some of the more negative developments coming out of the Silicon Valley recently. Throw in a polarizing presidential race where a certain candidate wants to literally build a wall around US borders, and it’s clear that inclusivity is as important as ever.

Diwali is also a great opportunity to highlight the advancements Asian Americans have made in technology, especially South Asian Americans. The heads of Google (Sundar Pichai) and Microsoft (Satya Nadella) — two major forces in the world of AI — are led by Indian Americans. They join other leaders across the technology ecosystem that we also want to recognize broadly.

Today we are open-sourcing Diwali. America embraced Yoga and Chicken Tikka, so why not Diwali too?

Creating a Binary Classifier to Sort Trump vs. Clinton Tweets Using NLP

The problem: Can we determine if a tweet came from the Donald Trump Twitter account (@realDonaldTrump) or the Hillary Clinton Twitter account (@HillaryClinton) using text analysis and Natural Language Processing (NLP) alone?

The Solution: Yes! We’ll divide this tutorial into three parts, the first on how to gather the necessary data, the second on data exploration, munging, & feature engineering, and the third on building our model itself. You can find all of our code on GitHub (https://git.io/vPwxr).

Part One: Collecting the Data
Note: We are going to be using Python. For the R version of this process, the concepts translate, and we have some code on Github that might be helpful. You can find the notebook for this part as “TweetGetter.ipynb” in our GitHub repository: https://git.io/vPwxr.

We used the Twitter API to collect tweets by both presidential candidates, which would become our dataset. Twitter only lets you access the latest ~3000 or so tweets from a particular handle, even though they keep all the Tweets in their own databases. 

The first step is to create an app on Twitter, which you can do by visiting https://apps.twitter.com/. After completing the form you can access your app, and your…

sparklyr: R interface for Apache Spark

This post is reposted from Rstudio’s announcement on sparklyr – Rstudio’s extension for Spark


  • Connect to Spark from R. The sparklyr package provides a complete dplyr backend.
  • Filter and aggregate Spark datasets then bring them into R for analysis and visualization.
  • Use Spark’s distributed machine learning library from R.
  • Create extensions that call the full Spark API and provide interfaces to Spark packages.


You can install the sparklyr package from CRAN as follows:


You should also install a local version of Spark for development purposes:

library(sparklyr) spark_install(version = "1.6.2") 

To upgrade to the latest version of sparklyr, run the following command and restart your r session:


If you use the RStudio IDE, you should also download the latest preview release of the IDE which includes several enhancements for interacting with Spark (see the RStudio IDE section below for more details).

Connecting to Spark

You can connect to both local instances of Spark as well as remote Spark clusters….

When is the Best Time to Look for Apartments on Craigslist?

A while ago I was looking for an apartment in San Francisco. There are a lot of problems with finding housing in San Francisco, mostly stemming from the fierce competition. I was checking Craigslist every single day. It still took me (and my girlfriend) a few months to find a place — and we had to sublet for three weeks in between. Thankfully we’re happily housed now but it was quite the journey. Others have talked about their search for SF housing, but I have a few tips myself:

1) While Craigslist continues to be the best resource for finding housing (it’s how I found my current apartment), there are quite a few Facebook groups that may also be useful. My experience has been of having weekly cycles, where I send out lots of emails, get a stream of responses, go visit 1-2 places per weekday evening, and then get a stream of rejections back. If you do check Craigslist, the best times to check are Tuesday and Wednesday evenings, and then the following mornings, as the following graphic shows.

screen-shot-2016-09-26-at-2-42-45-pm Read More


———- Forwarded message ———
From: SriSatish Ambati
Date: Thu, Sep 15, 2016 at 10:17 PM
Subject: changes and all hands tomorrow.
To: team


Our focus has changed towards larger fewer deals & deeper engagements with handful of finance and insurance customers.

We took a hard look at our marketing spend, pr programs and personnel. We let go most of our amazing inside sales talent. And two of our account executives. We are not building a vertical in IOT. In all nine business folks were affected. No further changes are anticipated or necessary.

These were heroic partners in our journey. I spoke to most all of them today to personally convey the message. Some were with me for a short time, many for years – all great humans who diligently served me and h2o well. I’m grateful for their support and partnership towards my vision. I learnt a lot from each one of them and will not hesitate to assist them any ways possible personally.

Thank you. heroes in bcc. may you find fulfillment & love in your path. It’s a small world and we will all meet very soon.

Our goal as a startup is…

Distracted Driving

Last week, we started to examine the 7.2% increase in traffic fatalities from 2014 to 2015, the reversal of a near decade-long downward trend. We then broke out the data by various accident classifications, such as “speeding” or “driving with a positive BAC,” and identified those classifications that had the greatest increase. One label that showed promise for improvement was “involving a distracted driver.” According to Pew Research, the number of Americans who own a mobile device has pretty consistently risen over the past decade, as has the number of Americans who own a smartphone. Moreover, apps like Pokemon Go have built-in features that incentivize driving while playing, and these types of augmented reality games are only going to become more common.

The National Highway Traffic Safety Association (NHTSA) defines distracted driving as “any activity that could divert a person’s attention away from the primary task of driving.” This includes several activities, from texting while driving, to using one hand to place a call. The Governor’s Highway Safety Administration (GHSA), an organization that “provides leadership and representation for the states and territories to improve traffic safety,” notes that states can even collect data on distracted driving…

Introducing H2O Community & Support Portals

At H2O, we enjoy serving our customers and the community, and we take pride in making them successful while using H2O products. Today, we are very excited to announce two great platforms for our customers and for the community to better communicate with H2O. Let’s start with our community first:

Community Badge

The success of every open source project depends on a vibrant community, and having an active community helps to convert an average product into a successful product. So to maintain our commitment to our H2O community, we are releasing an updated community platform at https://community.h2o.ai. This community platform is available for everyone, whether you are new to machine intelligence or are a seasoned veteran. If you are new to machine intelligence or H2O, you have an opportunity to learn from great minds, and if you are a seasoned industry veteran, you can not only enhance your skillset, you can also help others to achieve success.

Our objective is to develop this community in a way where every community member has the opportunity to establish himself or herself as a technology leader or expert by helping others. Every moment you spend here in the community,…

Fatal Traffic Accidents Rise in 2015

On Tuesday, August 30th, the National Highway Traffic Safety Administration released their annual dataset of traffic fatalities asking interested parties to use the dataset to identify the causes of an increase of 7.2% in fatalities from 2014 to 2015. As part of H2O.ai‘s vision of using artificial intelligence for the betterment of society we were excited to tackle this problem.


This post is the first in our series on the Department of Transportation dataset and driving fatalities which will hopefully culminate in a hackathon in late September, where we’ll invite community members to join forces with the talented engineers and scientists at H2O.ai to find a solution to this problem and prescribe policy changes.

To begin, we started by reading some literature and getting familiar with the data. These documents served as excellent inspiration for possible paths of analysis and guided our thinking. Our introductory investigation was based around asking a series of questions, paving the way for detailed analysis down the road. The dataset includes every (reported) accident along with several labels, from…

IoT – Take Charge of Your Business and IT Insights Starting at the Edge

Instead of just being hype, the Internet of Things (IoT) is now becoming a reality. Gartner forecasts that 6.4 billion connected devices will be in use worldwide, and 5.5 million new devices will get connected every day, in 2016. These devices range from wearables, to sensors in vehicles the can detect surrounding obstacles, to sensors in pipelines that detect their own wear-and-tear. Huge volumes of data are collected from these connected devices, and yet companies struggle to get optimal business and IT outcomes from it.

Why is this the case?
Rule-based data models limit insights. Industry experts have a wealth of knowledge manually driving business rules, which in turn drive the data models. Many current IoT practices simply run large volumes of data through these rule-based models, but the business insights are limited by what rule-based models allow. Machine Learning/Artificial Intelligence allows new patterns to be found within stored data without human intervention. These new patterns can be applied to data models, allowing new insights to be generated for better business results.
Analytics in the backend data center delay insights. In current IoT practice, data is collected and analyzed in the backend data center (e.g. OLAP/MPP…

H2O + TensorFlow on AWS GPU

TensorFlow on AWS GPU instance
In this tutorial, we show how to setup TensorFlow on AWS GPU instance and run H2O Tensorflow Deep learning demo.

To get started, request an AWS EC2 instance with GPU support. We used a single g2.2xlarge instance running Ubuntu 14.04.To setup TensorFlow with GPU support, following softwares should be installed:

  1. Java 1.8
  2. Python pip
  3. Unzip utility
  4. CUDA Toolkit (>= v7.0)
  5. cuDNN (v4.0)
  6. Bazel (>= v0.2)
  7. TensorFlow (v0.9)

To run H2O Tensorflow Deep learning demo, following softwares should be installed:

  1. IPython notebook
  2. Scala
  3. Spark
  4. Sparkling water

Software Installation:

 #To install Java follow below steps: Type ‘Y’ on installation prompt sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer Update JAVA_HOME in ~/.bashrc #Add JAVA_HOME to PATH: export PATH=$PATH:$JAVA_HOME/bin # Execute following command to update current session: source ~/.bashrc #Verify version and path: java -version echo $JAVA_HOME 


 #AWS EC2 instance has Python installed by default. Verify if Python 2.7 is installed already: python -V #Install pip sudo apt-get install python-pip #Install IPython notebook sudo pip install "ipython[notebook]" #To run H2O example notebooks, execute following commands: sudo pip...