ExplorerRoboticsRobotics
Research PaperResearchia:202606.23087

KEMO: Event-Driven Keyframe Memory for Long-Horizon Robot Manipulation with VLA Policies

Yihan Zeng

Abstract

Long-horizon robot manipulation remains challenging because similar observations may occur at different execution stages, while the appropriate action depends on previously completed operations. Memory can address this ambiguity by enabling policies to infer task progress from execution history. However, existing memory-augmented approaches often either retain dense histories that require compression or rely primarily on recent context that may discard earlier task-relevant events. In this work,...

Submitted: June 23, 2026Subjects: Robotics; Robotics

Description / Details

Long-horizon robot manipulation remains challenging because similar observations may occur at different execution stages, while the appropriate action depends on previously completed operations. Memory can address this ambiguity by enabling policies to infer task progress from execution history. However, existing memory-augmented approaches often either retain dense histories that require compression or rely primarily on recent context that may discard earlier task-relevant events. In this work, we propose propose KEMO, a lightweight plug-in memory framework that automatically selectively preserves keyframes associated with task-relevant state changes for VLA policies. KEMO combines robot kinematics with visual filtering to detect events, encodes the selected keyframes as compact temporally ordered memory tokens, and integrates them with current visual features through cross-attention and gated residual fusion for VLA training. The detected events also define higher-weight training samples near critical transitions. We evaluate KEMO on various real-world dual-arm manipulation tasks spanning 2 to 6 scored subtasks, and trajectory length ranging from 830 steps to 2846 execution steps (durations from 28 to 95 seconds). Compared with the memory-free baseline (e.g., π0.5π_{0.5}), KEMO improves aggregate Task Success Rate by 23.6% and Stage Completion Rate by 34.1%. Ablations show that event-driven keyframe selection outperforms uniform sampling and recent-frame retention, while the proposed gated fusion and keyframe-aligned loss weighting provide complementary gains.


Source: arXiv:2606.23589v1 - http://arxiv.org/abs/2606.23589v1 PDF: https://arxiv.org/pdf/2606.23589v1 Original Link: http://arxiv.org/abs/2606.23589v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 23, 2026
Topic:
Robotics
Area:
Robotics
Comments:
0
Bookmark
KEMO: Event-Driven Keyframe Memory for Long-Horizon Robot Manipulation with VLA Policies | Researchia