GH-8487: implement HGLM gaussian [nocheck] #16403

wendycwong · 2024-09-30T22:39:06Z

This PR fixes this issue: #8487

I have separated HGLM from GLM as its own toolbox. The only family that is supported now is Gaussian. I still need to do the following:

client tests (java, python/R) to make sure model metrics are passed;
client tests to make sure model summary, scoring history and coefficient tables are passed;
check and make sure we use the correct formula to estimate the residual noise variance, refer to the doc.
check and make sure we choose one of the likelihood methods. I implemented two. Refer to the doc.
HGLM_H2O_Implementation.pdf

maurever

Hi @wendycwong. Thanks for this big contribution.

I reviewed 80/110 files. I will continue tomorrow. It would be nice if, in this PR, you keep only HGLM-related changes. For example, implementation of HGLM can be one PR, and removing old code from GLM can be another PR. Also implementation of Python and R API can be separate PR.

It would make the review process much easier. Also, there would be less space for bugs.

h2o-algos/src/main/java/hex/gam/GAM.java

h2o-algos/src/main/java/hex/glm/GLM.java

h2o-algos/src/main/java/hex/hglm/ComputationStateHGLM.java

h2o-algos/src/main/java/hex/hglm/HGLM.java

h2o-algos/src/main/java/hex/hglm/HGLMModel.java

h2o-py/h2o/estimators/gam.py

h2o-py/tests/testdir_algos/glm/pyunit_GH_6722_separate_linear_beta_gaussian.py

h2o-py/tests/testdir_algos/glm/pyunit_benign_glm.py

h2o-py/tests/testdir_algos/glm/pyunit_link_functions_gaussian_glm.py

h2o-algos/src/main/java/hex/hglm/ComputationStateHGLM.java

h2o-algos/src/main/java/hex/hglm/MetricBuilderHGLM.java

h2o-algos/src/main/java/hex/schemas/HGLMModelV3.java

h2o-algos/src/test/java/hex/hglm/HGLMBasicTest.java

maurever · 2024-10-24T15:12:29Z

@wendycwong, I finished my review. I found just minor bugs. I tried to check all the math, and everything looks good. Tests passed.

Have you tried your test run on multinode? Just to be sure.

Thanks for this huge contribution!

maurever · 2024-10-25T15:24:47Z

Hi @wendycwong. Thanks for incorporating the suggestions. There are still two HGLM tests failing. So, after all the tests pass, I can approve the PR.

GH-8487: crafting HGLM parameters. GH-8487: implement EM algo. GH-8487: forming the fixed matrices and vectors. GH-8487: add test to make sure correct initialization of fixed, random coefficients, sigma values and T matrix. GH-8487: Finished implementing EM to estimate fixed coefficients, random coefficients, tmat and tauEVar GH-8487: finished implementing prediction but still need to figure out the model metrics calculation. GH-8487: Adding support for models without random intercept. GH-8487: adding normalization and denormalization of coefficients for fixed and random. GH-8487: Completed prediction implementation and added tests to make sure prediction is correct when standardize=true/false, random_intercept = true/false. GH-8487: fixing model metric classes. GH-8487: add python and R tests. GH-8487: adding hooks to generate synthetic data. GH-8487: added scoring history, model summary, coefficient tables. GH-8487: added modelmetrics for validation frame. GH-8487: From experiment to find best tauEVar calculation process. The one in equation 10 is best. GH-8487: add capability in Python client to extract scoring history, model summary, model metrics, model coefficients (fixed and random), icc, T matrix, residual variance. GH-8487: done checking scoring history, model summary and model metrics. GH-8487: added R client test for utility functions. GH-8487: use lambda_ instead lf Lambda in pyunit_benign_glm.py GH-8487: remove standardize from HGLM as the convention does not do standardization. Co-authored-by: Veronika Maurerová <maurever@users.noreply.github.com> Move test to check init values are set correctly to Python from Java. I was not able to find a good combination of initial betas/ubetas and t matrix to make it work.

wendycwong requested review from krasinski, valenad1, maurever and tomasfryda September 30, 2024 22:39

wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch 2 times, most recently from 9ecc510 to 925042a Compare October 7, 2024 23:13

wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch 3 times, most recently from 60ecdae to d7eeb43 Compare October 14, 2024 16:55

wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch 7 times, most recently from bb90e33 to 1f5c45b Compare October 21, 2024 17:41

wendycwong changed the title ~~GH-8487: implement HGLM gaussian~~ GH-8487: implement HGLM gaussian [nocheck] Oct 22, 2024

wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch 2 times, most recently from fa0559a to adcc679 Compare October 23, 2024 02:38

maurever reviewed Oct 23, 2024

View reviewed changes

wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch from 9777541 to b46698e Compare October 23, 2024 17:19

maurever reviewed Oct 24, 2024

View reviewed changes

wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch 2 times, most recently from 6f84597 to f1a2948 Compare October 24, 2024 20:19

maurever self-requested a review October 25, 2024 15:24

hannah-tillman mentioned this pull request Oct 25, 2024

GH-16413: Adding HGLM solo algorithm page [nocheck] #16419

Merged

wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch 2 times, most recently from cbe0444 to 3ed484b Compare October 27, 2024 23:34

wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch from b708d3c to 239f3c7 Compare October 28, 2024 19:10

remove commented out parts in test.

d827efe

krasinski approved these changes Oct 29, 2024

View reviewed changes

wendycwong merged commit 57bc954 into rel-3.46.0 Oct 29, 2024
61 of 77 checks passed

wendycwong deleted the wendy_gh_8487_HGLM_gaussian branch October 29, 2024 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-8487: implement HGLM gaussian [nocheck] #16403

GH-8487: implement HGLM gaussian [nocheck] #16403

wendycwong commented Sep 30, 2024

maurever left a comment

maurever commented Oct 24, 2024

maurever commented Oct 25, 2024

GH-8487: implement HGLM gaussian [nocheck] #16403

GH-8487: implement HGLM gaussian [nocheck] #16403

Conversation

wendycwong commented Sep 30, 2024

maurever left a comment

Choose a reason for hiding this comment

maurever commented Oct 24, 2024

maurever commented Oct 25, 2024