Avkash Chauhan

What is new in Sparkling Water 2.0.3 Release?

This release has H2O core – 3.10.1.2

Important Feature:

This architectural change allows to connect to existing h2o cluster from sparkling water. This has a benefit that we are no longer affected by Spark killing it’s executors thus we should have more stable solution in environment with lots of h2o/spark node. We are working on article on how to use this very important feature in Sparkling Water 2.0.3.

Release notes: https://0xdata.atlassian.net/secure/ReleaseNote.jspa?projectId=12000&version=16601

2.0.3 (2017-01-04)

  • Bug
    • SW-152 – ClassNotFound with spark-submit
    • SW-266 – H2OContext shouldn’t be Serializable
    • SW-276 – ClassLoading issue when running code using SparkSubmit
    • SW-281 – Update sparkling water tests so they use correct frame locking
    • SW-283 – Set spark.sql.warehouse.dir explicitly in tests because of SPARK-17810
    • SW-284 – Fix CraigsListJobTitlesApp to use local file instead of trying to get one from hdfs
    • SW-285 – Disable timeline service also in python integration tests
    • SW-286 – Add missing test in pysparkling for conversion RDD[Double] -> H2OFrame
    • SW-287 – Fix bug in SparkDataFrame converter where key wasn’t random if not specified
    • SW-288 – Improve performance of Dataset tests and call super.afterAll
    • SW-289 – Fix PySparkling numeric handling during conversions
    • SW-290 – Fixes and improvements of task used to extended h2o jars by sparkling-water classes
    • SW-292 – Fix ScalaCodeHandlerTestSuite
  • New Feature
    • SW-178 – Allow external h2o cluster to act as h2o backend in Sparkling Water
  • Improvement
    • SW-282 – Integrate SW with H2O 3.10.1.2 ( Support for external cluster )
    • SW-291 – Use absolute value for random number in sparkling-water in internal backend
    • SW-295 – H2OConf should be parameterized by SparkConf and not by SparkContext

Please visit https://community.h2o.ai to learn more about it, provide feedback and ask for assistance as needed.

@avkashchauhan | @h2oai

What is new in H2O latest release 3.10.2.1 (Tutte) ?

Today we released H2O version 3.10.2.1 (Tutte). It’s available on our Downloads page, and release notes can be found here.

sz42-6-wheels-lightened

Photo Credit: https://en.wikipedia.org/wiki/W._T._Tutte

Top enhancements in this release:

GLM MOJO Support: GLM now supports our smaller, faster, more efficient MOJO (Model ObJect, Optimized) format for model publication and deployment (PUBDEV-3664, PUBDEV-3695).

ISAX: We actually introduced ISAX (Indexable Symbolic Aggregate ApproXimation) support a couple of releases back, but this version features more improvements and is worth a look. ISAX allows you to represent complex time series patterns using a symbolic notation, reducing the dimensionality of your data and allowing you to run our ML algos or use the index for searching or data analysis. For more information, check out the blog entry here: Indexing 1 billion time series with H2O and ISAX. (PUBDEV-3367, PUBDEV-3377, PUBDEV-3376)

GLM: Improved feature and parameter descriptions for GLM. Next focus will be on improving documentation for the K-Means algorithm (PUBDEV-3695, PUBDEV-3753, PUBDEV-3791).

Quasibinomial support in GLM:
the quasibinomial family is similar to the binomial family except that, where the binomial models only support 0/1 for the values of a target, the quasibinomial family allows for two arbitrary values. This feature was requested by advanced users of H2O for applications such as implementing their own advanced estimators. (PUBDEV-3482, PUBDEV-3791)

GBM/DRF high cardinality accuracy improvements: Fixed a bug in the handling of large categorical features (cardinality > 32) that was there since the first release of H2O-3. Certain such categorical tree split decisions were incorrect, essentially sending observations down the wrong path at any such split point in the decision tree. The error was systematic and consistent between in-H2O and POJO/MOJO, and led to lower training accuracy (and often, to lower validation accurary). The handling of unseen categorical levels (in training and testing) was also inconsistent and unseen levels would go left or right without any reason – now they follow the path of a missing values consistently. Generally, models involving high-cardinality categorical features should have improved accuracy now. This change might require re-tuning of model parameters for best results. In particular the nbins_cats parameter, which controls the number of separable categorical levels at a given split, which has a large impact on the amount of memorization of per-level behavior that is possible: higher values generally (over)fit more.

Direct Download: http://h2o-release.s3.amazonaws.com/h2o/rel-tutte/1/index.html

For each PUBDEV-* information please look at the release note links at the top of this article

Accordingly to VP of Engineering Bill Gallmeister, this release consist of signifiant work done by his engineering team. For more information on these features and all the other improvements in H2O version 3.10.2.1, review our documentation.

Happy Holidays from all H2O team!!

@avkashchauhan (Avkash Chauhan)

Introducing H2O Community & Support Portals

At H2O, we enjoy serving our customers and the community, and we take pride in making them successful while using H2O products. Today, we are very excited to announce two great platforms for our customers and for the community to better communicate with H2O. Let’s start with our community first:

Community Badge

The success of every open source project depends on a vibrant community, and having an active community helps to convert an average product into a successful product. So to maintain our commitment to our H2O community, we are releasing an updated community platform at https://community.h2o.ai. This community platform is available for everyone, whether you are new to machine intelligence or are a seasoned veteran. If you are new to machine intelligence or H2O, you have an opportunity to learn from great minds, and if you are a seasoned industry veteran, you can not only enhance your skillset, you can also help others to achieve success.

Our objective is to develop this community in a way where every community member has the opportunity to establish himself or herself as a technology leader or expert by helping others. Every moment you spend here in the community, either by creating or consuming content, will not only help you to learn more, but will also help to establish your own brand as a reputed member of our machine intelligence community. Here are some highlights for our community:

  • The community content is distributed into 3 main sections as below:
    • Questions
    • Ideas
    • Knowledge Base Articles
  • The contents in the above sections is distributed among various technology groups called spaces, i.e. Algorithms, H2O, Sparkling Water, Exception, Debugging, Build, etc.

  • Every content needs to be part of a specific space so that experts in their space can provide faster and better responses. A list of all spaces is here.
  • As a visitor, you are welcome to visit every section of the community and learn from posts from community members.
  • Once logged in as community member using OpenID®, you can ask questions, write knowledge base articles, and propose ideas or feature requests for our products.
  • You are welcome to provide feedback to others’ content by liking the KB, question, or answer or simply by up-voting an idea.
  • As you spend more and more time here in community, you will be given higher roles toward management and improvement of your own community.
  • As logged-in member of community, every activity adds points toward your reputation, and as you spend more time in community, you will rank higher among your peers and establish yourself as an expert or a technology leader.
  • Please make sure you read the Guidelines before posting a question.
  • We are working towards making the site more integrated with other social platforms such as Twitter® and Facebook®, as well as adding support to other OpenID providers.

Now let me introduce our updated enterprise support portal:

Support Badge

H2O has been by over 60K data scientists since its initial release, and now more than 7,000 organizations worldwide use H2O applications, including H2O-3 and Sparkling Water. To assist our enterprise customers, we have revamped our enterprise support portal, which is available at https://support.h2o.ai. With this new portal, we are able to provide SLA-based, 24×7 support for our enterprise customers. Please visit this page to learn about the H2O enterprise support offering. While this support portal is specially catered to assist our enterprise customers, it is also open for everyone who is using any of the free, open source H2O applications.

You can open a support incident with the H2O support team in one of two ways:

  • Through the Support Portal
    • Please visit to support portal at https://support.h2o.ai, and select “NEW SUPPORT INCIDENT”.
    • You don’t need to be logged in to the support portal to open a new incident; however it is advisable to have an account so that you can monitor the ticket progress.
    • You will have an opportunity to set up incident priority, i.e. Low, High, Medium, or Urgent.
  • By Email
    • Send an email to support@h2o.ai describing your problem clearly.
    • Please attach any other info within the email in zipped format that could be helpful to identify the root cause.

When opening a support incident, please provide your H2O version, your Hadoop or Spark version (if applicable) and any logs, stack dump, or other information that might be helpful when troubleshooting this problem. Whether you are an H2O enterprise customer or just using one of our free, open source H2O applications, both of these venues are open for you to bring your question or comments. We are listening and are here to help.

We look forward to working with you through our community and support portals.

Avkash Chauhan

H2O Support: Customer focused and Community Driven