Add Sklearn pipeline test for more complicated Visualizers #1255

lwgray · 2022-05-28T20:35:26Z

There are some visualizers that require additional work in order to write sklearn pipeline test. It is likely that the underlying visualizer needs to expose learned attributes needed to generate the visualizers. The following is an example using sklearn pipeline for the InterClusterDistanceMetric visualizer:

AttributeError: 'Pipeline' object has no attribute 'cluster_centers_'

See issues and PR
#1253
#1248
#1249

Issue:
#1257
PR:
#1259

Issue:
#1256
PR:
#1262

Decision Boundaries
RFECV
ValidationCurve
Add a pipeline model input test and quick method test for feature importances
Add a pipeline model input test and quick method test for alpha selection
Add a pipeline model input test and quick method test for InterClusterDistanceMetric
KElbowVisualizer
SilhouetteVisualizer
GridSearchColorPlot

Example

     def test_within_pipeline(self):
        """
        Test that visualizer can be accessed within a sklearn pipeline
        """
        X, y = load_mushroom(return_dataset=True).to_numpy()
        X = OneHotEncoder().fit_transform(X).toarray()

        cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=11)
        model = Pipeline([
            ('minmax', MinMaxScaler()), 
            ('cvscores', CVScores(BernoulliNB(), cv=cv))
        ])

        model.fit(X, y)
        model['cvscores'].finalize()
        self.assert_images_similar(model['cvscores'], tol=2.0)

    def test_within_pipeline_quickmethod(self):
        """
        Test that visualizer quickmethod can be accessed within a
        sklearn pipeline
        """
        X, y = load_mushroom(return_dataset=True).to_numpy()
        X = OneHotEncoder().fit_transform(X).toarray()
        
        cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=11)
        model = Pipeline([
            ('minmax', MinMaxScaler()), 
            ('cvscores', cv_scores(BernoulliNB(), X, y, cv=cv, show=False,
                                      random_state=42))
            ])
        self.assert_images_similar(model['cvscores'], tol=2.0)

    def test_pipeline_as_model_input(self):
        """
        Test that visualizer can handle sklearn pipeline as model input
        """
        X, y = load_mushroom(return_dataset=True).to_numpy()
        X = OneHotEncoder().fit_transform(X).toarray()

        cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=11)
        model = Pipeline([
            ('minmax', MinMaxScaler()), 
            ('nb', BernoulliNB())
        ])

        oz = CVScores(model, cv=cv)
        oz.fit(X, y)
        oz.finalize()
        self.assert_images_similar(oz, tol=2.0)

    def test_pipeline_as_model_input_quickmethod(self):
        """
        Test that visualizer can handle sklearn pipeline as model input
        within a quickmethod
        """
        X, y = load_mushroom(return_dataset=True).to_numpy()
        X = OneHotEncoder().fit_transform(X).toarray()

        cv = StratifiedKFold(n_splits=2, shuffle=True, random_state=11)
        model = Pipeline([
            ('minmax', MinMaxScaler()), 
            ('nb', BernoulliNB())
        ])

        oz = cv_scores(model, X, y, show=False, cv=cv)
        self.assert_images_similar(oz, tol=2.0)

@DistrictDataLabs/team-oz-maintainers

The text was updated successfully, but these errors were encountered:

lwgray added the level: intermediate python coding expertise required label May 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Sklearn pipeline test for more complicated Visualizers #1255

Add Sklearn pipeline test for more complicated Visualizers #1255

lwgray commented May 28, 2022 •

edited by pdamodaran

Loading

Add Sklearn pipeline test for more complicated Visualizers #1255

Add Sklearn pipeline test for more complicated Visualizers #1255

Comments

lwgray commented May 28, 2022 • edited by pdamodaran Loading

lwgray commented May 28, 2022 •

edited by pdamodaran

Loading