Back to Explorer
Research PaperResearchia:202602.25008[Computational Linguistics > NLP]

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs

Yining Hong

Abstract

Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials where mistakes repeat rather than accumulate into experience. Drawing upon human reflective practitioners, we introduce Reflective Test-Time Planning, which integrates two modes of reflection: \textit{reflection-in-action}, where the agent uses test-time scaling to generate and score multiple candidate actions using internal reflections before execution; and \textit{reflection-on-action}, which uses test-time training to update both its internal reflection model and its action policy based on external reflections after execution. We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight for proper long-horizon credit assignment. Experiments on our newly-designed Long-Horizon Household benchmark and MuJoCo Cupboard Fitting benchmark show significant gains over baseline models, with ablative studies validating the complementary roles of reflection-in-action and reflection-on-action. Qualitative analyses, including real-robot trials, highlight behavioral correction through reflection.


Source: arXiv:2602.21198v1 - http://arxiv.org/abs/2602.21198v1 PDF: https://arxiv.org/pdf/2602.21198v1 Original Link: http://arxiv.org/abs/2602.21198v1

Submission:2/25/2026
Comments:0 comments
Subjects:NLP; Computational Linguistics
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs | Researchia | Researchia