H2O & LiblineaR: A tale of L2-LR

tl;dr: H2O and LiblineaR have nearly identical predictive performance.
 

Overview

 

In this blog, we examine the single-node implementations of L2-regularized logistic regression (LR) by H2O and LiblineaR.

Both LibR and H2O are driven from the R console on the same hardware and evaluated on the same datasets. We compare regression coefficients and behavior (AUC, Precision, Recall, F1) on hold out data. Before starting into the performance comparison, let's discuss some of the differences between the two packages.

 

Implementation Differences

 

Whooa… there shouldn't be any modeling differences, right? Well.. no, but there can be subtle implementation differences! Here we explain a few of the implementation details of H2O's GLM and LiblineaR's.
 

H2O

 

While we don't focus on the distributed aspects of H2O, it should be acknowledged that H2O's GLM modeling results come back as if the model was built on a single machine and retain the higher-quality single-machine results! H2O's state-of-the-art GLM uses Stephen Boyd's ADMM solver, allows for any combination of L1 & L2, performs automatic factor expansion (easily handling factors with thousands of levels), cross-validation, and optionally performs a grid search over the parameters. There are all sorts of model evaluation metrics reported by H2O's GLM: AUC, AIC, Error, by-class error, and deviances.
 
How does H2O distribute GLM?
 
A Gram matrix is built in a parallel and distributed way. The algorithm is essentially a two-step, iterative process of building a Gram matrix and then solving for betas, building a Gram, solving for betas, and so on, until convergence on the betas. In a distributed setting with N nodes, each node computes a Gram over its data. The Gram's are reduced together and the result is bit-for-bit identical to doing it all locally. If you want more, here are some slides on what we implemented: http://www.slideshare.net/mobile/0xdata/glm-talk-tomas. Also here is a link to the implementation in our git: https://github.com/0xdata/h2o/tree/master/src/main/java/hex/glm.

 

LiblineaR

 

LiblineaR is also an open source implementation of GLM in C++. We note that it is discussed extensively elsewhere [pdf], but also point out that it too has grid search capabilities and cross-validation.

 

In order to make fair comparisons, we match the input parameters between H2O and LiblineaR. Note that the cost parameter in LiblineaR is inversely proportional to the lambda used in H2O, scaled inversely by the number of parameters in the model:

 

$$C = \cfrac{1}{(\ell \times \lambda)}$$

 

where $$C$$ is the cost parameter in LiblineaR, $$\ell$$ is the number of features, and $$\lambda$$ is the shrinkage parameter.

 

Hardware, Software, & Datasets

 

Hardware

 
All comparisons were performed on a single machine with the following attributes (from /proc/cpuinfo)


processor : 31
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping : 7
microcode : 0x710
cpu MHz : 1200.000
cache size : 20480 KB
physical id : 1
siblings : 16
core id : 7
cpu cores : 8
apicid : 47
initial apicid : 47
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips : 5199.90
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual

 

Software

We used R version 3.0.2 “Frisbee Sailing” to interface with both LiblineaR (version 1.93) and H2O (build 1064).

 

Driving H2O from within R is easy! Checkout this blog http://0xdata.com/blog/2013/08/run-h2o-from-within-r/ and some slides from a recent meetup on the subject http://0xdata.com/blog/2013/08/big-data-science-in-h2o-with-r/ and of course this is all documented, http://docs.0xdata.com/Ruser/Rwrapper.html

 

Datasets

We used 3 datasets: Prostate, Sample Airlines (years 1987 – 2008), and Full Airlines (years 1987 – 2013). These data are publicly available to download. The parameters and models built on these datasets are as follows:

Prostate Sample Airlines('87 – '08) Full Airlines('87 – '13)
Features in Model 6 3 3
Number of Training Instances 306 24,442 128,654,471
Number of Testing Instances 76 2,692 14,290,947

 

Prostate: capsule ~ gleason + dpros + psa + dcaps + age + vol

H2O LiblineaR
family = binomial type = 0
link = logit ..
lambda = 1 / 700 cost = 100
alpha = 0.0 ..
beta_epsilon = 1E-4 epsilon = 1E-4
nfolds = 1 cross = 0

 

Small Airlines(years 1987 – 2008 sampled): isdepdelayed ~ deptime + arrtime + distance

H2O LiblineaR
family = binomial type = 0
link = logit ..
lambda = 0.0033333 cost = 100
alpha = 0.0 ..
beta_epsilon = 1E-4 epsilon = 1E-4
nfolds = 1 cross = 0

 

Full Airlines(years 1987 – 2013): isdepdelayed ~ deptime + arrtime + distance

H2O LiblineaR
family = binomial type = 0
link = logit ..
lambda = 0.0033333 cost = 100
alpha = 0.0 ..
beta_epsilon = 1E-4 epsilon = 1E-4
nfolds = 1 cross = 0

 

Numerical Performance

 

Prostate

Betas AGE DPROS DCAPS PSA VOL GLEASON INTERCEPT
H2O -0.06725409 0.5742158 0.1369673 0.4041241 -0.2270453 1.170544 -0.4930266
LiblineaR 0.06878511 -0.582572 -0.1335687 -0.4056746 0.2309275 -1.197098 0.4969579

Mean relative difference: 0.01601093
 

Test Evaluation AUC Precision Recall F1 Score
H2O 0.6907796 0.7608696 0.7608696 0.7608696
LiblineaR 0.6907796 0.7608696 0.7608696 0.7608696

 

Sample Airlines (years 1987 – 2008 sampled)

Betas DepTime ArrTime Distance Intercept
H2O 0.29061806 -0.027987806 0.1360023 0.19251044
LiblineaR 0.29585398 -0.032675851 0.1373844 0.19258853

Mean relative difference: 0.01759207
 

Test Evaluation AUC Precision Recall F1 Score
H2O 0.57245362 0.48479869 0.54078827 0.51126516
LiblineaR 0.56406416 0.35743632 0.56274256 0.43718593

 

Full Airlines (years 1987 – 2013)

Betas DepTime ArrTime Distance Intercept
H2O 0.3736 0.0233 0.1317 -0.3933
LiblineaR 0.377 0.0209 0.132 -0.393

Mean relative difference: 0.006942185
 

Test Evaluation AUC Precision Recall F1 Score
H2O 0.587 0.527 0.686 0.596
LiblineaR 0.552 0.841 0.625 0.717

 

Remarks & Conclusions

 

We can see that the H2O and LiblineaR do not vary much from one another (they all have a small mean relative difference of $$\approx 1 – 2\%$$). Typically, we would expect the objective functions being minimized to match exactly, and allow for differences in the coefficients (we see here that the betas are usually within $$10^{-3}$$). What is emphasized here are the similarities in predictive power, and we note that the AUCs above are all nearly identical.

 

It would be informative to involve a third reference (e.g. glmnet) to bolster the comparisons here. As this is a first stab at comparing H2O and LiblineaR, it is by no means complete. We will continue to add to this blog other datasets fit for comparison, and additionally give benchmark characteristics.
 
Additionally, we have skipped over a couple of obvious things: no categoricals were used here and the models aren't very good. For this comparison, we stripped down to the bare minimum (expanding categoricals for LiblineaR will be something that is tackled in the future) and studied non-categorical data only. All modeling was done by first setting the cost parameter to 100 and then proceeding (nothing magic about $$C = 100$$).

 

Reproducibility

 
The data are here: https://s3.amazonaws.com/h2o-bench/blog-2013-10-10
And the R scripts are here: https://github.com/0xdata/h2o/tree/master/R/tests