Skip to content

EdgarJaber/conformalized_gp

 
 

Repository files navigation

Conformal Approach To Gaussian Process Surrogate Evaluation With Coverage Guarantees

What is done in this repo

🔗 Requirements

Python 3.7+

OpenTURNS is a C++ library made, hence one can need to install gcc to be able to run the library

Ubuntu:

$ sudo apt update
$ sudo apt install build-essential

OSX:

$ brew install gcc

Windows: Install MinGW (a Windows distribution of gcc) or Microsoft’s Visual C

Install the required packages:

  • Via pip:
$ pip install -r requirements.txt
  • Via conda:
$ conda install -f environment.yml

🛠 Installation

Clone the repo and run the following command in the conformalized_gp directory to install the code

$ pip install .

⚡️ Quickstart

Here is a @quickstart to use the Jackknife+GP method on any regression dataset. Here, the goal is the compare visually the results given by the standard Jackknife+ method, the Credibility Intervals and our methodology. The notebook from which this quickstart is inspired can be found here

We first start to import the necessary packages

import matplotlib.pyplot as plt
import numpy as np
import scipy
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.model_selection import train_test_split

from mapie.conformity_scores.residual_conformity_scores import GPConformityScore
from mapie.regression import MapieRegressor

BLUE = np.array([[26, 54, 105]]) / 255
ORANGE = np.array([[223, 84, 49]]) / 255
YELLOW = np.array([[242, 188, 64]]) / 255
  • In this example, we are going to work on an analytical function of our imagination which have some good visual behavior :

$$g(x) = 3x\sin(x) - 2x\cos(x) + \frac{x^3}{40} - \frac{x^2}{2} - 10x$$

def g(x):
    return (3 * x * np.sin(x) - 2 * x * np.cos(x) + ( x ** 3) / 40 - .5 * x ** 2 - 10 * x)

x_mesh = np.linspace(-40, 60, 5000)
plt.plot(x_mesh, g(x_mesh))
plt.xlabel("$x$")
plt.ylabel("$g(x)$")

toy function

  • Then we split our data into train and test and train au sickit-learn GaussianProcessRegressor with a RBF kernel.
X_train, X_test, y_train, y_test = train_test_split(x_mesh, g(x_mesh), test_size=.98, random_state=42)
X_train = X_train.reshape(-1, 1)
X_test = X_test.reshape(-1, 1)
gp = GaussianProcessRegressor(normalize_y=True)
gp.fit(X_train, y_train)
  • We then define and train the two conformal methods (J+ and J+GP):
mapie_j_plus_gp = MapieRegressor(
    estimator=gp,
    cv=-1,
    method="plus",
    conformity_score=GPConformityScore(),
    model_has_std=True,
    random_state=42
)

mapie_j_plus = MapieRegressor(
    estimator=gp,
    cv=-1,
    method="plus",
    conformity_score=None,
    model_has_std=False,
    random_state=42
)


mapie_j_plus_gp.fit(X_train, y_train)
mapie_j_plus.fit(X_train, y_train)
  • Finally, we predict and compute prediction intervals with a confidence level of 90% on the test set and plot the prediction intervals of the three methods
ALPHA = .1

_, y_pss_j_plus_gp = mapie_j_plus_gp.predict(x_mesh.reshape(-1, 1), alpha=ALPHA)
_, y_pss_j_plus = mapie_j_plus.predict(x_mesh.reshape(-1, 1), alpha=ALPHA)

y_mean, y_std = gp.predict(x_mesh.reshape(-1, 1), return_std=True)

q_alpha_min = scipy.stats.norm.ppf(ALPHA / 2)
q_alpha_max = scipy.stats.norm.ppf(1 - ALPHA / 2)

f, ax = plt.subplots(1, 1, figsize=(20, 10))
ax.scatter(X_train, y_train, c=BLUE)


ax.plot(x_mesh, g(x_mesh), c=BLUE)
ax.plot(x_mesh, y_mean, c=YELLOW)
ax.fill_between(
        x_mesh,
        y_mean + y_std * q_alpha_min,
        y_mean + y_std * q_alpha_max,
        alpha=0.3,
        color=YELLOW,
        label=r"$\pm$ 1 std. dev.",
    )


ax.fill_between(
        x_mesh,
        y_pss_j_plus_gp[:, 0, 0],
        y_pss_j_plus_gp[:, 1, 0],
        alpha=.6,
        color=ORANGE,
        label=r"$\pm$ 1 std. dev.",
    )

ax.fill_between(
        x_mesh,
        y_pss_j_plus[:, 0, 0],
        y_pss_j_plus[:, 1, 0],
        alpha=.3,
        color="g",
        label=r"$\pm$ 1 std. dev.",
    )
ax.legend(
    [
        "Training Points",
        "True function", "Mean of posterior GP",
        "Posterior GP Credibility Interval",
        "Prediction Interval J+GP",
         "Prediction Interval J+", 
    ]
)
ax.set_xlabel("$x$")
ax.set_ylabel("$g(x)$")

toy function intervals

🔌 Plug OpenTURNS GP into MAPIE

If you wish to use our code with an OpenTURNS model, we have implemented a simple wrapper around the model so that it can be used very easily:

from wrappers import GpOTtoSklearnStd

nu = 5/2  # Hyperparameter of the Matérn Kernel
noise = None  # Standard deviation of the nugget effect. If None, no nugget effect is applied.
gp_estimator = GpOTtoSklearnStd(scale=1, amplitude=1, nu=nu, noise=None)

This estimator is now fully compatible with MAPIE as it comes with it .fit and .predict methods.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.6%
  • Python 5.4%