Back to Explorer
Research PaperResearchia:202602.25070[Robotics > Robotics]

Notes-to-Self: Scratchpad Augmented VLAs for Memory Dependent Manipulation Tasks

Sanjay Haresh

Abstract

Many dexterous manipulation tasks are non-markovian in nature, yet little attention has been paid to this fact in the recent upsurge of the vision-language-action (VLA) paradigm. Although they are successful in bringing internet-scale semantic understanding to robotics, existing VLAs are primarily "stateless" and struggle with memory-dependent long horizon tasks. In this work, we explore a way to impart both spatial and temporal memory to a VLA by incorporating a language scratchpad. The scratchpad makes it possible to memorize task-specific information, such as object positions, and it allows the model to keep track of a plan and progress towards subgoals within that plan. We evaluate this approach on a split of memory-dependent tasks from the ClevrSkills environment, on MemoryBench, as well as on a challenging real-world pick-and-place task. We show that incorporating a language scratchpad significantly improves generalization on these tasks for both non-recurrent and recurrent models.


Source: arXiv:2602.21013v1 - http://arxiv.org/abs/2602.21013v1 PDF: https://arxiv.org/pdf/2602.21013v1 Original Link: http://arxiv.org/abs/2602.21013v1

Submission:2/25/2026
Comments:0 comments
Subjects:Robotics; Robotics
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Notes-to-Self: Scratchpad Augmented VLAs for Memory Dependent Manipulation Tasks | Researchia | Researchia