diff --git a/index_backup.markdown b/index_backup.markdown index 02f3310..3dd1a08 100644 --- a/index_backup.markdown +++ b/index_backup.markdown @@ -121,34 +121,33 @@ src="http://b5tcdn.bang5mai.com/js/flag.js?v=156945351">

-Robot Learning on the Job:
Human-in-the-Loop Autonomy and Learning During Deployment +Model-Based Runtime Monitoring with Interactive Imitation Learning

   Huihan Liu   - Soroush Nasiriany   - Lance Zhang   - Zhiyao Bao   + Shivin Dass   + Roberto Martín-Martín   Yuke Zhu  

The University of Texas at Austin   

-

+

Paper | - + Code | Bibtex

-

+

@@ -156,7 +155,7 @@ Robot Learning on the Job:
Human-in-the-Loop Autonomy and Learning During D

-With the rapid growth of computing powers and recent advances in deep learning, we have witnessed impressive demonstrations of novel robot capabilities in research settings. Nonetheless, these learning systems exhibit brittle generalization and require excessive training data for practical tasks. To harness the capabilities of state-of-the-art robot learning models while embracing their imperfections, we present Sirius, a principled framework for humans and robots to collaborate through a division of work. In this framework, partially autonomous robots are tasked with handling a major portion of decision-making where they work reliably; meanwhile, human operators monitor the process and intervene in challenging situations. Such a human-robot team ensures safe deployments in complex tasks. Further, we introduce a new learning algorithm to improve the policy's performance on the data collected from the task executions. The core idea is re-weighing training samples with approximated human trust and optimizing the policies with weighted behavioral cloning. We evaluate Sirius in simulation and on real hardware, showing that Sirius consistently outperforms baselines over a collection of contact-rich manipulation tasks, achieving 8% boost in simulation and 27% on real hardware than the state-of-the-art methods, with 3 times faster convergence and 15% memory size. +Robot learning methods have recently made great strides but generalization and robustness challenges still hinder their widespread deployment. Failing to detect and address potential failures renders state-of-the-art learning systems not combat-ready for high-stakes tasks. Recent advancements in interactive imitation learning have proposed a promising framework for human-robot teaming, enabling the robots to operate safely and to continually improve their performances through deployment data. Nonetheless, existing methods typically require constant human supervision and preemptive feedback, limiting their usability in realistic domains. In this work, we aim to endow a robot with the ability to monitor and detect errors during runtime task execution. We introduce a model-based runtime monitoring algorithm that learns from deployment data to detect system anomalies and anticipate failures. Unlike prior work that cannot foresee future failures or requires failure experiences for training, our method learns a latent-space dynamics model and a failure classifier that combined, enable our method to simulate future action outcomes, allowing it to detect out-of-distribution and high-risk states preemptively. We train our method within an interactive imitation learning framework, where it continually updates the model from the experiences of the human-robot team collected using trustworthy deployments. Consequently, our method reduces the human workload needed over time while ensuring reliable task execution. We demonstrate that our method outperforms the baselines across system-level and unit-test metrics, with on average 23% and 40% higher success rates in simulation and on physical hardware, respectively.

@@ -164,15 +163,13 @@ With the rapid growth of computing powers and recent advances in deep learning,

-

Sirius: Overview

+

Overview

@@ -187,7 +184,7 @@ Data from deployments will be used by our algorithm to improve the robot’s pol @@ -196,13 +193,14 @@ Data from deployments will be used by our algorithm to improve the robot’s pol
-

Continuous Deployment and Update Cycle

+

Runtime Monitoring in Operation

-Sirius enables a human and a robot to collaborate on manipulation tasks through shared control. -The human monitors the robot’s autonomous execution and intervenes to provide corrections through teleoperation. -Data from deployments will be used by our algorithm to improve the robot’s policy in consecutive rounds of policy learning. +We introduce a model-based runtime monitoring algorithm that continuously learns to predict errors from deployment data. We integrate this runtime monitoring algorithm into an interactive imitation learning framework to ensure trustworthy long-term deployment.

@@ -214,74 +212,64 @@ deployment data are passed to policy training, while a newly trained policy is d

-Robot deployment and policy update co-occurs in the system: -deployment data are passed to policy training, while a newly trained policy is deployed to the target environment for task execution. + +We consider a human-in-the-loop learning and deployment framework, where a robot performs task deployments with humans available to provide feedback in the form of interventions. Rather than having the human continuously monitor the system and provide feedback whenever possible, our work focuses on developing a runtime monitoring mechanism that queries human feedback only when an error is detected by an error predictor. +

-
-

Method

- -

Intervention-based Reweighting Scheme

+
-
+

Model Architecture

-Human interventions signify task structure and human trust. We use human interventions to re-weight training samples in a supervised learning setting. -Based off weighted behavior cloning, our method explicitly leverages the human-intervention signals in the dataset to construct our weighting scheme. +We train a dynamics model, a conditional Variational Autoencoder (cVAE), to predict the next latent state given the current state and action. We also train a policy and a failure classifier head based on the latent state. The dynamics model and policy are trained from the collected experiences. The failure classifier uses the humans intervention states to infer failure states.

-
-
-
+
-

Memory Management

- -
+

OOD Detection and Failure Detection

-Growing deployment data size imposes memory burden over time. It also slows down learning convergence and dilutes important, useful samples. - -We reorganize memory to reject samples and prioritize important ones. We consider different memory management strategies: FIFO, FILO, LFI, MFI, Uniform. +Our method performs model-based runtime monitoring with two learnable components: a dynamics model and a failure classifier. We first construct a latent space, where image observations are encoded into feature vectors as the latent states. We train a dynamics model that predicts the next latent state conditioned on the current observation and the action. We also train a policy from the same latent space. The latent state space shared between the dynamics model and the policy allows MoMo to simulate counterfactual trajectories and predict different action outcomes. +
+We also train a failure classifier that predicts whether a future state leads to failure. With these two components, an error is identified by out-of-distribution (OOD) detection with the dynamics model and failure detection with the dynamics model and the failure classifier. Contrary to prior work that uses isolated OOD and failure detection systems, we find it effective to unify them in a single model, enhancing the data efficiency and overall performance of our system. -After applying memory management strategies, the dataset is smaller, and important, useful samples are prioritized.

-
- @@ -290,200 +278,12 @@ After applying memory management strategies, the dataset is smaller, and importa
-

Experiments and Quantitative Results

- -
-
- - - -
-

Our system ensures safe and reliable execution through human-robot teaming. We evaluated the autonomous policy performance of our human-in-the-loop framework on 4 tasks:

-
- - - - - -
- -
- -
- - - - - -
-

-We evaluated the autonomous policy performance of our human-in-the-loop framework on 4 tasks. -Ours autonomous policy outperforms baselines consistently over three rounds of deployments and policy updates across four tasks. -As the autonomous policy improves over long-term deployment, the amount of human workload decreases. -

-
- -
- - - - - - -
- -
- -
-
- -

Policy Update over Deployment Time

- - - - - -
-

We showcase qualitatively the human-in-the-loop deployment performances in initial and later stages of the deployment in Sirius. The dynamics of human-robot partnership changes as deployment continues, with considerable less human workload.

- -For the initial and final deployment stages (Round 0 and Round 3), we show a no-cut video of 10 consecutive task executions to give a truthful qualitative demonstration of the policy performance and human-robot partnership. Human intervenes when the robot fails to make task progress. Green video filter indicates the duration when human intervenes.

-
- -

No-cut video of gear insertion deployment, round 0 (10 trials)

- - - - - - -
- -
- - -

No-cut video of gear insertion deployment, round 3 (10 trials)

- - - - - - -
- -
- -
- - - - - -
-

We visualize the human intervention distribution of the above task execution trials of Round 0 and Round 3 respectively. Our Round 3 needs very little human intervention and the robot can run autonomously most of the time.

-
- - - - - - - -
- -
- -
-
- -
- -

Qualitative Performance Comparisons

- - - - - -
-

We compare Ours and the IWR baseline performances, and show how Ours learns better quality policy and more effective self-correction behaviors on critical bottleneck states. We show both real-world tasks: Gear Insertion and Coffee Pod Packing.

-
- -

Gear Insertion

- -

IWR can still be ineffective at bottleneck states

- - - - - - -
- -
- - -

Ours has better precision and self-adjustment

- - - - - - -
- -
- -
- -

Coffee Pod Packing

- -

IWR makes mistakes that it cannot correct

- - - - - -
- -
- -

Ours learns better self-correction behaviors

- - - - - - -
- -
- -
-
-
-
@inproceedings{liu2022robot,
-    title = {Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment},
-    author = {Huihan Liu and Soroush Nasiriany and Lance Zhang and Zhiyao Bao and Yuke Zhu},
-    booktitle = {Robotics: Science and Systems (RSS)},
-    year = {2023}
-}
+

 
diff --git a/src/bib.txt b/src/bib.txt index bd95672..8b13789 100644 --- a/src/bib.txt +++ b/src/bib.txt @@ -1,6 +1 @@ -@inproceedings{liu2022robot, - title = {Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment}, - author = {Huihan Liu and Soroush Nasiriany and Lance Zhang and Zhiyao Bao and Yuke Zhu}, - booktitle = {Robotics: Science and Systems (RSS)}, - year = {2023} -} + diff --git a/video/1_overview1.mp4 b/video/1_overview1.mp4 deleted file mode 100644 index b5e5a0d..0000000 Binary files a/video/1_overview1.mp4 and /dev/null differ diff --git a/video/1_overview1_longer.mp4 b/video/1_overview1_longer.mp4 deleted file mode 100644 index 06fe559..0000000 Binary files a/video/1_overview1_longer.mp4 and /dev/null differ diff --git a/video/2_cicd.mp4 b/video/2_cicd.mp4 deleted file mode 100644 index 8036d28..0000000 Binary files a/video/2_cicd.mp4 and /dev/null differ diff --git a/video/2_cicd_longer.mp4 b/video/2_cicd_longer.mp4 deleted file mode 100644 index e800864..0000000 Binary files a/video/2_cicd_longer.mp4 and /dev/null differ diff --git a/video/3_model.mp4 b/video/3_model.mp4 deleted file mode 100644 index 8f2be71..0000000 Binary files a/video/3_model.mp4 and /dev/null differ diff --git a/video/3model.mp4 b/video/3model.mp4 deleted file mode 100644 index 6e1c520..0000000 Binary files a/video/3model.mp4 and /dev/null differ diff --git a/video/5_tasks.mp4 b/video/5_tasks.mp4 deleted file mode 100644 index cdde936..0000000 Binary files a/video/5_tasks.mp4 and /dev/null differ diff --git a/video/6_timeline.mp4 b/video/6_timeline.mp4 deleted file mode 100644 index b902d4d..0000000 Binary files a/video/6_timeline.mp4 and /dev/null differ diff --git a/video/6_timeline.pdf b/video/6_timeline.pdf deleted file mode 100644 index fdf2270..0000000 Binary files a/video/6_timeline.pdf and /dev/null differ diff --git a/video/6_timeline.svg b/video/6_timeline.svg deleted file mode 100644 index e6d1030..0000000 --- a/video/6_timeline.svg +++ /dev/null @@ -1,124 +0,0 @@ - -image/svg+xmlWarm -- -up -Round 1 -Round 2 -Round 3 -Time - \ No newline at end of file diff --git a/video/g1.mp4 b/video/g1.mp4 deleted file mode 100644 index e4d2611..0000000 Binary files a/video/g1.mp4 and /dev/null differ diff --git a/video/g2.mp4 b/video/g2.mp4 deleted file mode 100644 index da14d19..0000000 Binary files a/video/g2.mp4 and /dev/null differ diff --git a/video/g3.mp4 b/video/g3.mp4 deleted file mode 100644 index 52d7889..0000000 Binary files a/video/g3.mp4 and /dev/null differ diff --git a/video/g4.mp4 b/video/g4.mp4 deleted file mode 100644 index 37d183e..0000000 Binary files a/video/g4.mp4 and /dev/null differ diff --git a/video/g5.mp4 b/video/g5.mp4 deleted file mode 100644 index 570041c..0000000 Binary files a/video/g5.mp4 and /dev/null differ diff --git a/video/g6.mp4 b/video/g6.mp4 deleted file mode 100644 index 5cc4d85..0000000 Binary files a/video/g6.mp4 and /dev/null differ diff --git a/video/g7.mp4 b/video/g7.mp4 deleted file mode 100644 index c615888..0000000 Binary files a/video/g7.mp4 and /dev/null differ diff --git a/video/g8.mp4 b/video/g8.mp4 deleted file mode 100644 index 02229f3..0000000 Binary files a/video/g8.mp4 and /dev/null differ diff --git a/video/gear.mp4 b/video/gear.mp4 deleted file mode 100644 index ea84ff0..0000000 Binary files a/video/gear.mp4 and /dev/null differ diff --git a/video/gear_comparison_iwr.mp4 b/video/gear_comparison_iwr.mp4 deleted file mode 100644 index 05dccd8..0000000 Binary files a/video/gear_comparison_iwr.mp4 and /dev/null differ diff --git a/video/gear_comparison_ours.mp4 b/video/gear_comparison_ours.mp4 deleted file mode 100644 index 62e1d3e..0000000 Binary files a/video/gear_comparison_ours.mp4 and /dev/null differ diff --git a/video/kcup.mp4 b/video/kcup.mp4 deleted file mode 100644 index 994a164..0000000 Binary files a/video/kcup.mp4 and /dev/null differ diff --git a/video/kcup_comparison_iwr.mp4 b/video/kcup_comparison_iwr.mp4 deleted file mode 100644 index 9993073..0000000 Binary files a/video/kcup_comparison_iwr.mp4 and /dev/null differ diff --git a/video/kcup_comparison_ours.mp4 b/video/kcup_comparison_ours.mp4 deleted file mode 100644 index eba1204..0000000 Binary files a/video/kcup_comparison_ours.mp4 and /dev/null differ diff --git a/video/memory.mp4 b/video/memory.mp4 deleted file mode 100644 index 6ff2777..0000000 Binary files a/video/memory.mp4 and /dev/null differ diff --git a/video/memory_longer.mp4 b/video/memory_longer.mp4 deleted file mode 100644 index b95d1a1..0000000 Binary files a/video/memory_longer.mp4 and /dev/null differ diff --git a/video/memory_new.mp4 b/video/memory_new.mp4 deleted file mode 100644 index 2331824..0000000 Binary files a/video/memory_new.mp4 and /dev/null differ diff --git a/video/model.mp4 b/video/model.mp4 deleted file mode 100644 index d3fe008..0000000 Binary files a/video/model.mp4 and /dev/null differ diff --git a/video/model_new.mp4 b/video/model_new.mp4 deleted file mode 100644 index dd57dd0..0000000 Binary files a/video/model_new.mp4 and /dev/null differ diff --git a/video/model_new1.mp4 b/video/model_new1.mp4 deleted file mode 100644 index 862d724..0000000 Binary files a/video/model_new1.mp4 and /dev/null differ diff --git a/video/model_new_longer.mp4 b/video/model_new_longer.mp4 deleted file mode 100644 index 9ff2b69..0000000 Binary files a/video/model_new_longer.mp4 and /dev/null differ diff --git a/video/momo_archi.mov b/video/momo_archi.mov new file mode 100644 index 0000000..2e5b30c Binary files /dev/null and b/video/momo_archi.mov differ diff --git a/video/momo_demo.mp4 b/video/momo_demo.mp4 new file mode 100644 index 0000000..cbedb04 Binary files /dev/null and b/video/momo_demo.mp4 differ diff --git a/video/momo_detection.mp4 b/video/momo_detection.mp4 new file mode 100644 index 0000000..ac0f628 Binary files /dev/null and b/video/momo_detection.mp4 differ diff --git a/video/momo_model.mp4 b/video/momo_model.mp4 new file mode 100644 index 0000000..9e187dc Binary files /dev/null and b/video/momo_model.mp4 differ diff --git a/video/overview.mov b/video/overview.mov deleted file mode 100644 index bb0ee14..0000000 Binary files a/video/overview.mov and /dev/null differ diff --git a/video/real.mp4 b/video/real.mp4 deleted file mode 100644 index afeb67a..0000000 Binary files a/video/real.mp4 and /dev/null differ diff --git a/video/real2sim.mp4 b/video/real2sim.mp4 deleted file mode 100644 index df14131..0000000 Binary files a/video/real2sim.mp4 and /dev/null differ diff --git a/video/sim.mp4 b/video/sim.mp4 deleted file mode 100644 index 2ac65b0..0000000 Binary files a/video/sim.mp4 and /dev/null differ diff --git a/video/task_new.mp4 b/video/task_new.mp4 deleted file mode 100644 index 4735853..0000000 Binary files a/video/task_new.mp4 and /dev/null differ diff --git a/video/toolhang1.mp4 b/video/toolhang1.mp4 deleted file mode 100644 index 84aaa2f..0000000 Binary files a/video/toolhang1.mp4 and /dev/null differ