ExplorerArtificial IntelligenceAI
Research PaperResearchia:202602.18056

BFS-PO: Best-First Search for Large Reasoning Models

Fiorenzo Parascandolo

Abstract

Large Reasoning Models (LRMs) such as OpenAI o1 and DeepSeek-R1 have shown excellent performance in reasoning tasks using long reasoning chains. However, this has also led to a significant increase of computational costs and the generation of verbose output, a phenomenon known as overthinking. The tendency to overthinking is often exacerbated by Reinforcement Learning (RL) algorithms such as GRPO/DAPO. In this paper, we propose BFS-PO, an RL algorithm which alleviates this problem using a Best-F...

Submitted: February 18, 2026Subjects: AI; Artificial Intelligence

Description / Details

Large Reasoning Models (LRMs) such as OpenAI o1 and DeepSeek-R1 have shown excellent performance in reasoning tasks using long reasoning chains. However, this has also led to a significant increase of computational costs and the generation of verbose output, a phenomenon known as overthinking. The tendency to overthinking is often exacerbated by Reinforcement Learning (RL) algorithms such as GRPO/DAPO. In this paper, we propose BFS-PO, an RL algorithm which alleviates this problem using a Best-First Search exploration strategy. Specifically, BFS-PO looks for the shortest correct answer using a backtracking mechanism based on maximum entropy nodes. By generating progressively shorter responses during training, BFS-PO learns to produce concise reasoning chains. Using different benchmarks and base LRMs, we show that BFS-PO can simultaneously increase the LRM accuracy and shorten its answers.


Source: arXiv:2602.14917v1 - http://arxiv.org/abs/2602.14917v1 PDF: https://arxiv.org/pdf/2602.14917v1 Original Link: http://arxiv.org/abs/2602.14917v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Feb 18, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark
BFS-PO: Best-First Search for Large Reasoning Models | Researchia