ExplorerArtificial IntelligenceAI
Research PaperResearchia:202606.04004

Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)

Nizar Islah

Abstract

When post-trained language models fail on reasoning problems, the common test-time-scaling response is to spend more compute on additional attempts, and the failed traces play no further role. We argue this discards a crucial signal; some failures come from unlucky sampling, where more rollouts help, while others are structural and resist resampling regardless of budget. We propose that failed traces encode recoverability structure: the inference-time signature of which test-time interventions c...

Submitted: June 4, 2026Subjects: AI; Artificial Intelligence

Description / Details

When post-trained language models fail on reasoning problems, the common test-time-scaling response is to spend more compute on additional attempts, and the failed traces play no further role. We argue this discards a crucial signal; some failures come from unlucky sampling, where more rollouts help, while others are structural and resist resampling regardless of budget. We propose that failed traces encode recoverability structure: the inference-time signature of which test-time interventions can rescue a given failure. Three problem-level trajectory features, derived from the structure of available interventions, recover this structure from the distributional signature of failed rollouts, not their text. They cluster failures into stable regimes, characterize the failure topography of different post-training methods (84.3±4.3%84.3{\pm}4.3\% accuracy, +20%+20\% over a majority-class baseline), and support a training-free routing rule that lifts rescue by +12.2%+12.2\% on the deployment-relevant Steerable-Hard subset (failures where retry is insufficient and a bounded intervention is reachable). The features and the routing rule transfer across two cross-family probes. The same three features thus convert failed traces from discarded data into a diagnostic object, supporting test-time routing and post-training analysis without training-time or weight-space access.


Source: arXiv:2606.05145v1 - http://arxiv.org/abs/2606.05145v1 PDF: https://arxiv.org/pdf/2606.05145v1 Original Link: http://arxiv.org/abs/2606.05145v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 4, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark