Photo Credit: https://en.wikipedia.org/wiki/W._T._Tutte
Top enhancements in this release:
ISAX: We actually introduced ISAX (Indexable Symbolic Aggregate ApproXimation) support a couple of releases back, but this version features more improvements and is worth a look. ISAX allows you to represent complex time series patterns using a symbolic notation, reducing the dimensionality of your data and allowing you to run our ML algos or use the index for searching or data analysis. For more information, check out the blog entry here: Indexing 1 billion time series with H2O and ISAX. (PUBDEV-3367, PUBDEV-3377, PUBDEV-3376)
GLM: Improved feature and parameter descriptions for GLM. Next focus will be on improving documentation for the K-Means algorithm (PUBDEV-3695, PUBDEV-3753, PUBDEV-3791).
Quasibinomial support in GLM: the quasibinomial family is similar to the binomial family except that, where the binomial models only support 0/1 for the values of a target, the quasibinomial family allows for two arbitrary values. This feature was requested by advanced users of H2O for applications such as implementing their own advanced estimators. (PUBDEV-3482, PUBDEV-3791)
GBM/DRF high cardinality accuracy improvements: Fixed a bug in the handling of large categorical features (cardinality > 32) that was there since the first release of H2O-3. Certain such categorical tree split decisions were incorrect, essentially sending observations down the wrong path at any such split point in the decision tree. The error was systematic and consistent between in-H2O and POJO/MOJO, and led to lower training accuracy (and often, to lower validation accurary). The handling of unseen categorical levels (in training and testing) was also inconsistent and unseen levels would go left or right without any reason – now they follow the path of a missing values consistently. Generally, models involving high-cardinality categorical features should have improved accuracy now. This change might require re-tuning of model parameters for best results. In particular the nbins_cats parameter, which controls the number of separable categorical levels at a given split, which has a large impact on the amount of memorization of per-level behavior that is possible: higher values generally (over)fit more.
Direct Download: http://h2o-release.s3.amazonaws.com/h2o/rel-tutte/1/index.html
For each PUBDEV-* information please look at the release note links at the top of this article
Accordingly to VP of Engineering Bill Gallmeister, this release consist of signifiant work done by his engineering team. For more information on these features and all the other improvements in H2O version 22.214.171.124, review our documentation.
Happy Holidays from all H2O team!!
@avkashchauhan (Avkash Chauhan)