Interview with Carolyn Phillips, Sr. Data Scientist, Neurensic

During Open Tour Chicago we conducted a series of interviews with data scientists attending the conference. This is the second of a multipart series recapping our conversations.

Be sure to keep an eye out for updates by checking our website or following us on Twitter @h2oai.

AAEAAQAAAAAAAAeRAAAAJGZmMWZiMGE1LTVlMDgtNGQwZi05NzYyLTEwMTMxNDhmODcwMw How did you become a data scientist?

Phillips: Until very close to two months ago I was a computational scientist working at Argonne National Laboratory. Okay.

Phillips: I was working in material science, physics, mathematics, etc., but I was getting bored with that and looking for new opportunities, and I got hired by a startup in Chicago. Yes.

Phillips: When they hired me they said, “We’re hiring you, and your title is Senior Quantitative Analyst,” but the very first day I showed up, they said, “Scratch that. We’ve changed your title. Your title is now Senior Data Scientist.” And I said, “Yes, all right.” It has senior in it, so I’m okay going with that. Nice. I like it.

Phillips: So I’m a mathematician, physicist and computer scientist by training who likes to solve problems with data and algorithms, and so now I’m a data scientist. That’s impressive. I don’t know if people have really wrapped their head around what it means to be a data scientist.

Phillips: I will say that one of the reasons why I started looking around for data scientist positions is that I come from an academic research background. I have a PhD in physics and computing, and a lot of my peers who have a very, very similar background to me – we did research together, we wrote papers together – became frustrated with academic research for various reasons. Many of them said, “Well, rats. I have a skill set that’s valuable,” and they’ve become data scientists. They work at places like Airbnb, they work at consulting firms, they work at startups. Each one of us has reached that point where we’ve said, “I’m frustrated with being an academic researcher.” I saw the direction that many of my peers had gone in saying, “I have a good skill set and it is valuable, and the place right now where that is being valued is in this area called data science, and I shall go into it,” and I said, “That’s a good idea. I’ll do that too.” There you go. That’s my story. Wow, that’s really cool, yeah. I mean, I’m finding that the more people I talk to the greater number of paths towards becoming a data scientist I find. So what’s your biggest pain point as a data scientist?

Phillips: Data preparation. We want to get more data from our companies, and theoretically all this data is being generated by the same software everywhere. But different companies configure that software differently, and it’s a lot of work to make sure all the data you get is formatted in the same way. Yes, I see.

Phillips: Everything I do has to have meaning. For example, I built this beautiful algorithm and I love it, and I applied it to the data, and we found this result in the data, and we said, “What is that? Look at that. Oh, my goodness, what is that? What is that? That’s crazy. That’s terrible, you know, we have to get right on that.” Yes.

Phillips: And I thought, well, before we get too excited, let me dig down to the original raw data that generated this. Dig, dig, dig, dig, dig. Oh, we assumed that data would always come in this format, and this data came in that format, and at the end of the day it looked like something it wasn’t, so I feel like that’s actually the big challenge. Oh, very interesting. Do you have methods of making your data more uniform?

Phillips: Well, I’m not responsible for that directly, but no. Every time we get in a new source of data it’s going to be this painful process of normalizing it so that it looks as much as possible like the other sources of data. Thank you so much, Carolyn. That was really helpful information. It was a pleasure meeting you.

Phillips: You too.

Interview with Svetlana Kharlamova, ­Sr. Data Scientist, Grainger

During Open Tour Chicago we conducted a series of interviews with data scientists attending the conference. This is the first of a multipart series recapping our conversations.

Be sure to keep an eye out for updates by checking our website or following us on Twitter @h2oai.

Svetlana Kharlamova How did you become a data scientist?

Kharlamova: I’m a physicist. Okay.

Kharlamova: I came here from the academia of physics. I worked for seven years in academia for physics and math, and four years ago I switched to finance to be more of a math person than a physics person. I see.

Kharlamova: And from finance I came to the data industry. At that time data science was booming. Oh, okay.

Kharlamova: And I got excited with all new the stuff and technologies coming up, and here I am. Okay, nice. So what business do you work for now?

Kharlamova: I work for Grainger. We’re focused on equipment distribution; serving as a connector between manufacturing plants, factories and consumers. So what are some of the problems that you guys are looking to solve?

Kharlamova: Building recommendation engines for customers. For that you need to leverage natural language processing and positive logic. What resources do you use to stay on top of the information in the data science world? Are there blogs that you read or like, or places that you go?

Kharlamova: Staff communities and data science communities are important sources of information. Yes. That’s great. And is there any advice that you would have for someone who’s an up and coming data scientist, or someone who’s just generally interested in the field?

Kharlamova: Advice to somebody who’s generally interested in the field? Yes, about becoming a data scientist.

Kharlamova: It’s a difficult question, because if a person takes a one year course on Coursera or somewhere else on data science, it doesn’t mean that they’re a data scientist yet, because you need to see the problem in the big picture. Yes.

Kharlamova: You need to be able to identify the challenges, the problem and various solutions. You cannot explore everything. You need to narrow down your choice. Yes, okay.

Kharlamova: You also need to have substantial knowledge of mathematics, statistics and computer science. But understand that you don’t need to immediately start using a sophisticated random forest model. Maybe you can just use simple algebra. Maybe it’s a question of two plus two. Right.

Kharlamova: And then you don’t need all these assumptions and approximations. Because I’m a physicist, I like a defined correct answer much more than something fuzzy. To be successful as a data scientist you need to decide how best to approach a problem then find a solution that’s as simple as possible. Okay. I see. That’s great advice. So it’s not just about having the knowledge, but it’s also about having an approach that is, like you said, simple, that you can probably use more often to provide a clear answer. That’s great, great advice.

H2O Day at Capital One

Here at one of our most important partners is Capital One, and we’re proud to have been working with them for over a year. One of the world’s leading financial services providers, Capital One has a strong reputation for being an extremely data and technology-focused organization. That’s why when the Capital One team invited us to their offices in McLean, Virginia for for a full day of H2O talks and demos we were delighted to accept. Many key members of Capital One’s technology team were among the 500+ attendees at the event, including Jeff Chapman, MVP of Shared Technology, Hiren Hiranandani, Lead Software Engineer, Mike Fulkerson, VP of Software Engineering and Adam Wenchel, VP of Data Engineering.

A major theme throughout the day was “vertical is the new horizontal,” an idea presented by our CEO Sri Ambati, about how every company is becoming a technology company. Sri pointed out that software is becoming increasingly ubiquitous at organizations at the same time that code is becoming a commodity. Today, the only assets that companies can defend is their community and brand. Airbnb is more valuable than most hospitality companies, despite owning no property, and Uber is more valuable than most transportation companies, despite owning no vehicles. And if “software is eating the world” then artificial intelligence (AI) is eating software, as traditional rules-based models no longer cut it in today’s rapidly changing world.

Our partnership started about a year ago, where we met in California, and learned about the value proposition of H2O. To be honest, I think we were all floored by what we saw. – Jeff Chapman

This was obviously an important message for attendees at Capital One, who were looking to learn more about AI and machine learning. Of particular interest was how machine learning and AI can help with use cases such as personalization and fraud detection and how the technology can drive future data-driven decision making. Attendees also had a chance to share their experiences using H2O to analyze and score models with their colleagues across business units. The event fit perfectly into’s vision of a grassroots community that encourages cooperation and the sharing of information. We look forward to continuing to work with Capital One, and all of our partners, to promote the democratization of data science and the growth of open source communities.

Visit us online to find a local event where you can meet with the makers of H2O in-person. Please also don’t forget to see the video of our time at Capital One here!

Drink in the Data with H2O at Strata SJ 2016

It’s about to rain data in San Jose when Strata + Hadoop World comes to town March 29 – March 31st.

H2O has a waterfall of action happening at the show. Here’s a rundown of what’s on tap.
Keep it handy so you have less chance of FOMO (fear of missing out).

Hang out with H2O at Booth #1225 to learn more about how machine learning can help transform your business and find us throughout the conference:

Tuesday, March 29th

Wednesday, March 30th

  • 12:45pm – 1:15pm Meet the Makers: The brains and innovation behind the leading machine learning solution is on hand to hack with you
    • #AskArno – Arno Candel, Chief Architect and H2O algorithm expert
    • #RuReady with Matt Dowle, H2O Hacker and author of R data.table
    • #SparkUp with Michal Malohlava principal developer of Sparkling Water
    • #InWithErin – Erin LeDell, Machine Learning Scientist and H2O ensembles expert
  • 2:40pm – 3:20pm H2O highlighted in An introduction to Transamerica’s product recommendation platform
  • 5:50pm – 6:50pm Booth Crawl. Have a beer on us at Booth #1225
  • 7:00pm – 9:00pm Let it Flow with H2O – Drinks + Data at the Arcadia Lounge. Grab your invite at Booth #1225

Thursday, March 31st

  • 12:45pm – 1:15pm Ask Transamerica. Vishal Bamba and Nitin Prabhu of Transamerica join us at Booth #1225 for Q&A with you!

The Top 10 Most Watched Videos From H2O World 2015

Now that we’re a few months out from H2O World we wanted to share with you all what the most popular talks were by online viewership. The talks covered a variety of topics from introductions, to in-depth examinations of use cases, to wide-ranging panels.

Introduction to Data Science
Featuring Erin LeDell, Statistician and Machine Learning Scientist,
An introductory talk for people new to the field of data science.

Intro to R, Python, Flow
Featuring Amy Wang, Math Hacker,
A hands-on demonstration of how to run H2O in R and Python and an introduction to the Flow GUI.

Machine Learning at Comcast
Featuring Andrew Leamon, Director of Engineering Analysis, Comcast and Chushi Ren, Software Engineer, Comcast
An inside look at how Comcast leverages machine learning across its business units.

Migrating from Proprietary Analytics Stacks to Open Source H2O
Featuring Fonda Ingram, Technical Manager,
A ten-year SAS veteran explains how to migrate from proprietary software to an open source environment.

Top 10 Data Science Pitfalls
Featuring Mark Landry, Product Manager,
A Kaggle champion offers an overview of ten top pitfalls to avoid when performing data science.

Featuring Erin LeDell, Statistician and Machine Learning Scientist,
Another popular talk from Erin, this time providing an overview specifically of ensemble learning.

Sparkling Water
Featuring Michal Malohlava, Software Engineer,
An introduction to Sparkling Water, H2O’s Spark API, by one of its key architects.

Panel – Competitive Data Science
Featuring Arno Candel, Chief Architect,, Phillip Adkins, Data Scientist, Banjo, Nick Kridler, Data Scientist, Stich Fix, Mark Landry, Product Manager,, John Park, Principal Data Scientist, Hewlett-Packard Enterprise, Lauren Savage, Data Scientist, AT&T and Guocong Song, Data Scientist, Playground.Global
A panel discussion covering all aspects of competitive data science.

Survey of Available Machine Learning Frameworks
Featuring Brenden Herger, Data Scientist, Capital One
An overview of available machine learning frameworks and an analysis of why teams use specific ones.

Panel – Industrial Data Science – Practitioners’ Perspective
Featuring SriSatish Ambati, CEO & Cofounder,, Xaviar Amatriain, VP of Engineering, Quora, Scott Marsh, Research & Development Analyst, Progressive Insurance, Taposh Dutta Roy, Manager, Kaiser Permanente, Nachum Shacham, Principal Data Scientist, PayPal and Daqing Zhao, Director of Advanced Analytics, Macy’
A discussion of large data science deployments by the people most familiar with them.

A great selection of talks if we do say so ourselves! Is it too early to start counting the days to H2O World 2016?

H2O World from an Attendee’s Perspective

Data Science is like Rome, and all roads lead to Rome. H2O WORLD is the crossroad, pulling in a confluence of math, statistics, science and computer science and incorporating all avenues of business. From the academic, research oriented models to the business and computer science analytics implementations of those ideas, H2O WORLD informs attendees on H2O’s ability to help users and customers explore their data and produce a prediction or answer a question.

I came to H2O World hoping to gain a better understanding of H2O’s software and of Data Science in general. I thoroughly enjoyed attending the sessions, following along with the demos and playing with H2O myself. Learning from the hackers and Data Scientists about the algorithms and science behind H2O and seeing the community spirit at the Hackathons was enlightening. Listening to the keynote speakers, both women, describe our data-influenced future and hearing the customer’s point of view on how H2O has impacted their work has been inspirational. I especially appreciated learning about the potential influence on scientific and medical research and social issues and H2O’s ability to influence positive change.

Curiosity led me to delve into the world of Data Science and as a person with a background of science and math, I wasn’t sure how it applied to me. Now I realize that there is virtually no discipline which cannot benefit from the methods of Data Science and that there is great power in asking the right questions and telling a good story. H2O WORLD broadened my horizons and gave me a new perspective on the role of Data Science in the world. Data science can be harnessed as force for social good where a few people from around the globe can change the world. H2O World 2015 was a great success and I truly enjoyed learning and being there. at ODSC SF 2015!

As promised, we’re here reporting from the floor of the ( Open Data Science Conference (ODSC). It’s been another wild day for us, with an early start at 7:30am to set up ahead of the show. However, the long days are all worth it for a chance to see you all in the field. While we thought bringing two boxes of booklets would be enough we ended up running out again!

Located in the luxurious Marriott Waterfront hotel ODSC is hosting 20 workshops, 50 speaking sessions and a thousand attendees. Speakers include Brian Granger, co-founder of Jupyter, Anthony Goldbloom, CEO & Founder of Kaggle, Andre Mueller, Assistant Research Engineer at the NYU Center for Data Science, and Wes McKinney, creator of the Pandas Python Data Analysis Library. On Sunday data scientist Hank Roark will joining this list of prestigous speakers to give a talk on “Big Data Machine Learning and Data Products with Python and H2O“. Looking forward to seeing you there!

Questions? Tweet us @h2oai

H2O at ML Conf SF 2015

H2O is ubiquitous, and just like H2O, our team is everywhere! Today we attended the ( 2015 Machine Learning Conference in San Francisco. Located at the gorgeous Julia Morgan Ballroom the ML Conference brought together some of the world’s foremost experts on machine learning, including the tireless Xavier Amatriain, VP of Engineering at Quora, fresh off his talk at H2O World. The speaking lineup also included folks from IBM, CMU, Kaggle, Ayasdi, ChaLearn, Google, Netflix, Numenta, Stitch Fix, Ufora, Intel, Walmart Labs, UC Irvine, Skymind, Slack and Baidu. As expected, many of our H2O fans were among attendees, leading to so much traffic at our booth that we ran out of booklets!

Tomorrow we’re off to another two days of fun at the ( Open Data Science Conference being held at the luxurious Marriott Waterfront in San Francisco. Looking forward to seeing you there!

Questions? Tweet us @h2oai

H2O World Third Day Wrap-Up

H2O fans, we know that distance and the twin holidays of Veteran’s Day and Diwali kept many of you from attending the grand finale of H2O World, but we want to at least give you a taste of all that went on at the Computer History Museum in Mountain View. Day 3 of H2O World got off to a strong start with a massive panel on creating a culture of data-driven decision making. The panel included experts from and AT&T.


The morning continued with talks from GoPro and Board Member Michael Marks, Conor Jensen, Analytics Program Director at Zurich North America, and a very informative session on GLRM from Madeleine Udell! Before heading over to our delicious food trucks, H2O World attendees also had the opportunity to hear several keynotes. The first was an explanation of what the next generation of data products would look like from non other than data science expert Hilary Mason!


Hilary’s talk was followed by keynotes from Kaiser Permanente Vice President, Jason P Jones, who spoke about how machine learning can help with clinical decision making, and from Stanford Professor Rob Tibshirani, who spoke about using the lasso method for high dimensional supervised learning. Attendees who weren’t immediately distracted by the onset of the lunch hour had a chance to get their books signed by Hilary and Rob.


Lest you think that all the fun happened in the morning, we want to assure you that the afternoon was jam-packed as well! We hosted three panels on algorithm design and application gotchas, machine learning in financial services and machine learning in natural language processing, respectively. We also had a series of terrific talks from fellow H2O fans just like yourselves at Transamerica Corporation, Progressive, Macy’s, Nielsen Catalina Solutions, Lexalytics, Sociogramics, Altiscale, MarketShare, Machine Zone, Data Fellas, Epoch and Trendkite.

If you were at the show tell us what YOUR favorite session was by tweeting us @h2oai #h2oworld. We’ll give you a hint as to our favorite part…


Questions? Tweet us @h2oai #h2oworld

H2O World Second Day Wrap-Up

H2O fans, we didn’t think that our second day could top our first, but somehow it did! Still, although we had record attendance, we know that a lot of you aren’t here. While we can’t hope to get across all that’s happened, we do want to share some of the highlights. The morning started off with CEO Sri Ambati welcoming attendees and giving them a special sneak peek at the future roadmap of H2O.


Before even getting to lunch attendees were treated to a fascinating talk from world-renowned data science expert Monica Rogati, use case demonstrations from PayPal, Comcast, and Quora, and an explanation of Consensus Lasso from the man Sri calls “the Bob Dylan of data science,” Stanford Professor Stephen Boyd. H2O World attendees not completely entranced by our wonderful collection of food trucks got some one-on-one time to ask Monica Rogati questions during a special “fireside chat.”


At we’re focused on achieving impact, and nothing impacts people as much as their health. That’s why we were so proud to be a part of Kaiser Permanente Health Data Project Lead Taposh Dutta Roy’s afternoon talk on using data science to help battle cancer. The afternoon also featured talks from H2O users at PayPal, Capital One, AT&T, Google Analytics, GenomeDx and 6sense. A series of panels on industrial data science, the future of data science and the last mile of data science delivery were part of the afternoon’s agenda as well.


Of course, no event is complete without a party, and H2O World is no different! After a full day of listening to the world’s leading data science practitioners talk about their work, attendees were bussed over to another world at the Mos Eisley Cantina. Fortunately, none of our open source Jedi fell prey to the vile machinations of the Sith, but it was a close call!


Questions? Tweet us @h2oai #h2oworld