ICRA 2026 Beyond Teleoperation Workshop

Introduction

Teleoperation has driven progress in robot learning, but it is fundamentally limiting: control interfaces introduce latency and reduce degrees of freedom, producing demonstrations that lack the natural dexterity and diverse strategies humans exhibit in everyday manipulation. Human video and simulation each offer qualitatively different strengths. Unscripted human video captures whole-body coordination, tool use, and manipulation strategies that no teleoperation interface can reproduce. Simulation enables exploration of contact-rich dynamics, failure recovery, and edge cases at scale. Yet both carry challenges: embodiment mismatch and the sim-to-real gap. The central question is not just how do we get more data, but how do we extract what makes these sources qualitatively richer, and bridge the gaps that separate them from deployable robot skills?

This workshop focuses on methods that:

Learn dexterous manipulation skills from human demonstrations, going beyond what teleoperation interfaces can capture
Build world models grounded in real-world physics from diverse, in-the-wild video data
Use small, high-quality teleoperation data as an anchor alongside large-scale, diverse off-domain sources
Bridge embodiment and sim-to-real gaps to transfer manipulation skills across domains

We bring together researchers working on learning from human data, simulation, and robot teleoperation to share insights and collaborate on building more general-purpose manipulation systems.

Confirmed Speakers

Imperial College London

Roberto Martín-Martín

UT Austin

Panel Discussion

Our panel with the invited speakers (4:20-5:20pm) will cover questions such as:

Does the community truly need to move beyond teleoperation? Should we favor massive (but noisy) human/sim data, or smaller (but clean) robot datasets?
What are the "hidden" costs behind using human/simulation data? Can we combine both sources to get the best of both worlds?
Is small, high-quality teleop data a necessary "anchor" for grounding large-scale human video/simulation training?

Workshop Schedule

Start Time	End Time	Event
08:30	09:00	Welcome (Organizers)
09:00	09:30	Talk 1: Jitendra Malik
09:30	10:00	Talk 2: Edward Johns
10:00	11:00	Break + Poster Session
11:00	11:30	Talk 3: Danfei Xu
11:30	12:00	Talk 4: Karen Liu
12:00	12:30	Talk 5: Katerina Fragkiadaki
12:30	13:30	Break
13:30	14:00	Talk 6: Yue Wang
14:00	14:30	Talk 7: Roberto Martín-Martín
14:30	15:00	Spotlight Talks (4 papers)
15:00	16:00	Break + Poster Session
16:00	16:20	Sponsor Talk: Steve Xie (Lightwheel)
16:20	17:20	Panel Discussion
17:20	17:30	Closing Remarks

Accepted Papers

All accepted papers will be presented as posters across two sessions. Four papers were also selected for spotlight talks.

Spotlight Talks — 14:30–15:00

MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation
UniDex-ViTac: Learning Unified Visuo-Tactile Dexterous Manipulation Policy from Human Video Data
DemoDiffusion: One-Shot Human Imitation using Pre-trained Diffusion Policy
Humanoid Bimanual Dexterous Manipulation Driven by Egocentric Video

Poster Session 1 — 10:00–11:00

Reconstructing Hand-Held Objects in 3D from Images and Videos
MotionTrans: Human VR Data Enable Motion-Level Learning for Robotic Manipulation Policies
One-Shot Learning of Manipulation from RGB-D Videos via Object-Centric Interaction Reasoning
Humanoid Bimanual Dexterous Manipulation Driven by Egocentric Video
YUBI: Yielding Universal Bidigital Interface for Bimanual Dexterous Manipulation at Scale
Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation
Tune to Learn: How Controller Gains Affect Robot Policy Learning
CRAFT: Video Diffusion for Bimanual Robot Data Generation
Whole-Body Mobile Manipulation using Offline Reinforcement Learning on Sub-optimal Controllers
Point Bridge: 3D Representations for Cross Domain Policy Learning
IFG: Internet-Scale Guidance for Functional Grasping Generation
MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation
HumanoidMimicGen: Data Generation for Loco-Manipulation via Whole-Body Planning and Adaptation
X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations

Poster Session 2 — 15:00–16:00

UniDex-ViTac: Learning Unified Visuo-Tactile Dexterous Manipulation Policy from Human Video Data
PHABS: A Handheld Haptic Device for Force-Annotated Bimanual Demonstration Data
Learning Whole-Body Humanoid Locomotion via Motion Generation and Motion Tracking
Learning Quadruped Locomotion from Casual Videos
HOMimic: Distilling Manipulation Trajectories from Human Videos via Multi-Stage Interaction Reasoning and Taxonomy-Aware Retargeting
Few-Shot Learning of Tool-Use Skills with Proximity and Tactile Sensing
Object-Centric Reward Learning from Action-Free Videos for Long-Horizon Manipulation Beyond Teleoperation
Semantic–Geometric Task Representations for Bimanual Manipulation from Human Demonstrations to Robot Action Planning
MobileEgo Anywhere: Open Infrastructure for Long-Horizon Egocentric Data on Commodity Hardware
Learning Sim-Grounded Policies for Bimanual Rope Manipulation from Human Teleoperation Data
DemoDiffusion: One-Shot Human Imitation using Pre-trained Diffusion Policy
Overcoming Distribution Shifts with Autonomous Embodied Data Collection
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow
UniLatent: Cross-Embodiment Transfer via Latent Observation Alignment
Learning Structured Policies for General Humanoid Loco-Manipulation