Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] GH-6769: multinomial dt yuliia #16310

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

syzonyuliia
Copy link
Contributor

@syzonyuliia syzonyuliia commented Jun 19, 2024

@wendycwong
Copy link
Collaborator

Please add to your CompressLeaf class accommodation for leave nodes that will contain an array of probabilities one for each of the multinomial class. Currently, it only takes one value for binary classification.

@wendycwong
Copy link
Collaborator

I run into NPE error with this dataset:
sdt_3EnumCols_10kRows_multinomial.csv

@wendycwong
Copy link
Collaborator

With this code:

import sys
sys.path.insert(1,"../../../")
import h2o
from tests import pyunit_utils
from h2o.estimators import H2ODecisionTreeEstimator

def test_dt_multinomial():
data = h2o.import_file(pyunit_utils.locate("smalldata/sdt/sdt_3EnumCols_10kRows_multinomial.csv"))
response_col = "response"
data[response_col] = data[response_col].asfactor()

predictors = ["C1", "C2", "C3"]

# train model
dt = H2ODecisionTreeEstimator(max_depth=3)
dt.train(x=predictors, y=response_col, training_frame=data)

dt.show()

if name == "main":
pyunit_utils.standalone_test(test_dt_multinomial)
else:
test_dt_multinomial()

@valenad1
Copy link
Collaborator

I would like to have an basic

  • Java test for multinomial
  • Python test for multinomial
  • R test for multinomial
  • Comparsion with DRF one tree multinomial

@@ -128,6 +127,7 @@ private AbstractSplittingRule findBestSplitForFeature(Histogram histogram, int f


private static double calculateCriterionOfSplit(SplitStatistics binStatistics) {
// if(binStatistics.() == 2) // todo - fix bin statistics first, they are binomial-only now
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please finish those TODOs

@wendycwong
Copy link
Collaborator

@syzonyuliia

Thank you for your work. It looks great.

@@ -19,6 +19,7 @@ public static final class DTParametersV3 extends ModelParametersSchemaV3<DTModel
"categorical_encoding",
"response_column",
"seed",
"distribution",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wendycwong what is the idea behind the distribution here? I see that we optimize entropy in splits.. the attribute is not used in code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants