This repository accompanies the paper How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation submitted to IEEE RA-L. The purposes of this repository are to
- store our experimental data,
- show how we process our experimental data to produce the tables and figures in our paper,
- allow users to run their own experiments and statistical evaluations with diffusion policies in the robosuite simulator.
Following the instructions in README_diffusion_poliy.md
,
To reproduce our simulation benchmark results, install our conda environment on a Linux machine with Nvidia GPU. On Ubuntu 20.04 you need to install the following apt packages for mujoco:
$ sudo apt install -y libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf
We recommend Mambaforge instead of the standard anaconda distribution for faster installation:
$ mamba env create -f conda_environment.yaml
but you can use conda as well:
$ conda env create -f conda_environment.yaml
The
conda_environment_macos.yaml
file is only for development on MacOS and does not have full support for benchmarks.
The data from the simulation experiments is in the results/
directory.
The notebook analyze_results_sim.ipynb
loads this data and produces confidence bounds. This notebook produces Table 1 and Figure 6 in the paper.
The data from the hardware experiments is in the results/pour_ice/
directory.
The notebook analyze_results_hardware.ipynb
loads this data and produces confidence bounds. The results are used to make Figure 7 in the paper.
The notebook analyze_results_comparisons.ipynb
produces the confidence bounds for this experiment. The results are used to make Figure 8 in the paper.
Comparing the tightness of the binomial bound we use to the Clopper-Pearson bound is done in the binomial_CIs
repository. Specifically, the notebook tradeoff_table.ipynb
produces Figure 3 in the paper.
Comparing the tightness of the CDF bound we use (based on the Kolmogorov-Smirnov test) to the CDF bound from the DKW inequality is done in the analyze_cdf_bounds.ipynb
notebook. This notebook produces Figure 4 in the paper.
To run new experiments you must first download the trained policies from the Diffusion Policy paper. These policies can be found at the url https://diffusion-policy.cs.columbia.edu/data/.
At this url, the policies we evaluate in our paper are those given by the filepaths:
experiments/image/task/_ph/diffusion_policy_cnn/train_0/checkpoints/latest.ckpt
where task
is one of can
, lift
, square
, tool_hang
, transport
.
Once the desired policy is downloaded, you can use eval.py
to evaluate the policy in simulation.
In this file you can specify the policy, task, whether domain modification (i.e. OOD modification) is on, and how many policy rollouts to run. The data from the experiment will then be stored in a timestamped folder in the results/
directory.
Find a lower bound on some unknown success probability
from binomial_cis import binom_ci
k = 5 # number of successes
n = 10 # number of trials
alpha = 0.05 # miscoverage probability
lb = binom_ci(k, n, alpha, 'lb')
Find an upper bound on some unknown CDF of reward
n = 10 # number of trials
alpha = 0.05 # miscoverage probability
F_ub = F_hat + KS_epsilon(n, alpha)
where F_hat
is the empirical CDF, and KS_epsilon
is the offset chosen to meet the coverage guarantee:
from scipy.special import smirnov
def KS_dist(n, epsilon):
cdf_val = 1 - smirnov(n, epsilon)
return np.clip(cdf_val, 0, 1)
def KS_epsilon(n, alpha, tol=1e-8):
# find smallest epsilon such that KS_dist(n, alpha, epsilon) >= 1-alpha
lb = 0
ub = 1
for i in range(100):
epsilon = (ub - lb) / 2 + lb
coverage = KS_dist(n, epsilon)
if coverage >= 1-alpha:
if coverage - (1-alpha) <= tol:
return epsilon
else:
ub = epsilon
else:
lb = epsilon
raise ValueError("Too few iterations on bisection search!")
This repository is a fork of the diffusion policy repository. To conduct our research we made the following changes
- Modified
eval.py
. - Modified
robomimic_image_runner.py
. - Added
domain_alteration_wrapper.py
to robosuite package. Specifically, added the file.mambaforge/envs/stochastic_verification/lib/python3.9/site-packages/robosuite/wrappers/domain_modification_wrapper.py
. - Added the
results/
directory. This holds the results for sim runs.