Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-8487: implement HGLM gaussian [nocheck] #16403

Merged
merged 2 commits into from
Oct 29, 2024

Conversation

wendycwong
Copy link
Collaborator

This PR fixes this issue: #8487

I have separated HGLM from GLM as its own toolbox. The only family that is supported now is Gaussian. I still need to do the following:

  1. client tests (java, python/R) to make sure model metrics are passed;
  2. client tests to make sure model summary, scoring history and coefficient tables are passed;
  3. check and make sure we use the correct formula to estimate the residual noise variance, refer to the doc.
  4. check and make sure we choose one of the likelihood methods. I implemented two. Refer to the doc.
    HGLM_H2O_Implementation.pdf

@wendycwong wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch 2 times, most recently from 9ecc510 to 925042a Compare October 7, 2024 23:13
@wendycwong wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch 3 times, most recently from 60ecdae to d7eeb43 Compare October 14, 2024 16:55
@wendycwong wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch 7 times, most recently from bb90e33 to 1f5c45b Compare October 21, 2024 17:41
@wendycwong wendycwong changed the title GH-8487: implement HGLM gaussian GH-8487: implement HGLM gaussian [nocheck] Oct 22, 2024
@wendycwong wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch 2 times, most recently from fa0559a to adcc679 Compare October 23, 2024 02:38
Copy link
Contributor

@maurever maurever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wendycwong. Thanks for this big contribution.

I reviewed 80/110 files. I will continue tomorrow. It would be nice if, in this PR, you keep only HGLM-related changes. For example, implementation of HGLM can be one PR, and removing old code from GLM can be another PR. Also implementation of Python and R API can be separate PR.

It would make the review process much easier. Also, there would be less space for bugs.

h2o-algos/src/main/java/hex/gam/GAM.java Outdated Show resolved Hide resolved
h2o-algos/src/main/java/hex/glm/GLM.java Outdated Show resolved Hide resolved
h2o-algos/src/main/java/hex/hglm/ComputationStateHGLM.java Outdated Show resolved Hide resolved
h2o-algos/src/main/java/hex/hglm/HGLM.java Outdated Show resolved Hide resolved
h2o-algos/src/main/java/hex/hglm/HGLMModel.java Outdated Show resolved Hide resolved
h2o-py/h2o/estimators/gam.py Outdated Show resolved Hide resolved
h2o-py/tests/testdir_algos/glm/pyunit_benign_glm.py Outdated Show resolved Hide resolved
h2o-algos/src/main/java/hex/hglm/ComputationStateHGLM.java Outdated Show resolved Hide resolved
@maurever
Copy link
Contributor

@wendycwong, I finished my review. I found just minor bugs. I tried to check all the math, and everything looks good. Tests passed.

Have you tried your test run on multinode? Just to be sure.

Thanks for this huge contribution!

@wendycwong wendycwong force-pushed the wendy_gh_8487_HGLM_gaussian branch 2 times, most recently from 6f84597 to f1a2948 Compare October 24, 2024 20:19
@maurever
Copy link
Contributor

Hi @wendycwong. Thanks for incorporating the suggestions. There are still two HGLM tests failing. So, after all the tests pass, I can approve the PR.

GH-8487: crafting HGLM parameters.
GH-8487: implement EM algo.
GH-8487: forming the fixed matrices and vectors.
GH-8487: add test to make sure correct initialization of fixed, random coefficients, sigma values and T matrix.
GH-8487: Finished implementing EM to estimate fixed coefficients, random coefficients, tmat and tauEVar
GH-8487: finished implementing prediction but still need to figure out the model metrics calculation.
GH-8487: Adding support for models without random intercept.
GH-8487: adding normalization and denormalization of coefficients for fixed and random.
GH-8487: Completed prediction implementation and added tests to make sure prediction is correct when standardize=true/false, random_intercept = true/false.
GH-8487: fixing model metric classes.
GH-8487: add python and R tests.
GH-8487: adding hooks to generate synthetic data.
GH-8487: added scoring history, model summary, coefficient tables.
GH-8487: added modelmetrics for validation frame.
GH-8487: From experiment to find best tauEVar calculation process.  The one in equation 10 is best.
GH-8487: add capability in Python client to extract  scoring history, model summary, model metrics, model coefficients (fixed and random), icc, T matrix, residual variance.
GH-8487: done checking scoring history, model summary and model metrics.
GH-8487: added R client test for utility functions.
GH-8487: use lambda_ instead lf Lambda in pyunit_benign_glm.py
GH-8487: remove standardize from HGLM as the convention does not do standardization.

Co-authored-by: Veronika Maurerová <maurever@users.noreply.github.com>

Move test to check init values are set correctly to Python from Java.  I was not able to find a good combination of initial betas/ubetas and t matrix to make it work.
@wendycwong wendycwong merged commit 57bc954 into rel-3.46.0 Oct 29, 2024
61 of 77 checks passed
@wendycwong wendycwong deleted the wendy_gh_8487_HGLM_gaussian branch October 29, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants