Back to Explorer
Research PaperResearchia:202603.05090[Robotics > Robotics]

HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations

Xiaomeng Xu

Abstract

We present Whole-Body Mobile Manipulation Interface (HoMMI), a data collection and policy learning framework that learns whole-body mobile manipulation directly from robot-free human demonstrations. We augment UMI interfaces with egocentric sensing to capture the global context required for mobile manipulation, enabling portable, robot-free, and scalable data collection. However, naively incorporating egocentric sensing introduces a larger human-to-robot embodiment gap in both observation and action spaces, making policy transfer difficult. We explicitly bridge this gap with a cross-embodiment hand-eye policy design, including an embodiment agnostic visual representation; a relaxed head action representation; and a whole-body controller that realizes hand-eye trajectories through coordinated whole-body motion under robot-specific physical constraints. Together, these enable long-horizon mobile manipulation tasks requiring bimanual and whole-body coordination, navigation, and active perception. Results are best viewed on: https://hommi-robot.github.io


Source: arXiv:2603.03243v1 - http://arxiv.org/abs/2603.03243v1 PDF: https://arxiv.org/pdf/2603.03243v1 Original Link: http://arxiv.org/abs/2603.03243v1

Submission:3/5/2026
Comments:0 comments
Subjects:Robotics; Robotics
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations | Researchia