ExplorerArtificial IntelligenceAI
Research PaperResearchia:202603.04001

Tool Verification for Test-Time Reinforcement Learning

Ruotong Liao

Abstract

Test-time reinforcement learning (TTRL) has emerged as a promising paradigm for self-evolving large reasoning models (LRMs), enabling online adaptation on unlabeled test inputs via self-induced rewards through majority voting. However, a spurious yet high-frequency unverified consensus can become a biased and reinforced reward signal, leading to incorrect mode collapse. We address this failure mode with T^3RL (Tool-Verification for Test-Time Reinforcement Learning), which introduces test-time to...

Submitted: March 4, 2026Subjects: AI; Artificial Intelligence

Description / Details

Test-time reinforcement learning (TTRL) has emerged as a promising paradigm for self-evolving large reasoning models (LRMs), enabling online adaptation on unlabeled test inputs via self-induced rewards through majority voting. However, a spurious yet high-frequency unverified consensus can become a biased and reinforced reward signal, leading to incorrect mode collapse. We address this failure mode with T^3RL (Tool-Verification for Test-Time Reinforcement Learning), which introduces test-time tool verification into reward estimation. Concretely, a verifier uses an external tool as evidence (e.g., from code execution) to upweight verified rollouts in a verification-aware voting, producing more reliable pseudo-labels for training. Across various math difficulties (MATH-500, AMC, and AIME 2024) and diverse backbone types, T^3RL significantly improves over TTRL, with larger gains on harder problems. More broadly, T^3RL can be viewed as verified online data synthesis, highlighting test-time tool verification as a key mechanism for stabilizing self-evolution.


Source: arXiv:2603.02203v1 - http://arxiv.org/abs/2603.02203v1 PDF: https://arxiv.org/pdf/2603.02203v1 Original Link: http://arxiv.org/abs/2603.02203v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Mar 4, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark
Tool Verification for Test-Time Reinforcement Learning | Researchia