ExplorerData ScienceMachine Learning
Research PaperResearchia:202606.30058

Learning from Mistakes: Rollout-Retrieval Lifelong Policy Learning for Autonomous Driving

Cheng Gong

Abstract

Autonomous driving policies should be able to improve continually as deployment exposes them to increasingly diverse and long-tail traffic situations. However, most learning-based policies are trained or fine-tuned on expert demonstrations and then rely largely on generalization to handle challenging closed-loop scenarios, lacking an explicit mechanism to correct and retain the mistakes exposed in these scenarios. This paper studies autonomous driving policy improvement from a lifelong learning ...

Submitted: June 30, 2026Subjects: Machine Learning; Data Science

Description / Details

Autonomous driving policies should be able to improve continually as deployment exposes them to increasingly diverse and long-tail traffic situations. However, most learning-based policies are trained or fine-tuned on expert demonstrations and then rely largely on generalization to handle challenging closed-loop scenarios, lacking an explicit mechanism to correct and retain the mistakes exposed in these scenarios. This paper studies autonomous driving policy improvement from a lifelong learning perspective: Can a pretrained policy improve continually by accumulating corrective knowledge derived from its own mistakes, while retaining previously acquired driving competence? To answer this question, we propose Rollout-Retrieval Lifelong Policy Learning (R2^2LPL), a policy learning framework that retrieves corrective targets from recoverable policy-induced mistakes and retains the resulting knowledge through lifelong policy learning. R^2LPL addresses a key bottleneck in continual policy improvement: closed-loop mistakes reveal where the policy is weak, but do not directly specify what the policy should learn. By filtering recoverable mistake-related states and retrieving feasible corrective targets, R2^2LPL turns sparse failure evidence into compact supervised knowledge for stable and sample-efficient policy improvement. We evaluate R2^2LPL on large-scale closed-loop nuPlan benchmarks. With only a few rollout and continual-learning cycles, R2^2LPL elevates a learning-based planner with moderate initial performance to state-of-the-art performance across the evaluated benchmarks, especially on the challenging and long-tail Test14-hard split. These results demonstrate the effectiveness of R2^2LPL in converting recoverable closed-loop mistakes into corrective knowledge for sustained policy improvement.


Source: arXiv:2606.30537v1 - http://arxiv.org/abs/2606.30537v1 PDF: https://arxiv.org/pdf/2606.30537v1 Original Link: http://arxiv.org/abs/2606.30537v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 30, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
Learning from Mistakes: Rollout-Retrieval Lifelong Policy Learning for Autonomous Driving | Researchia