ExplorerRoboticsRobotics
Research PaperResearchia:202603.04076

$π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

Siting Wang

Abstract

Flow-based vision-language-action (VLA) models excel in embodied control but suffer from intractable likelihoods during multi-step sampling, hindering online reinforcement learning. We propose \textbf{\textit{$\boldsymbolπ$-StepNFT}} (Step-wise Negative-aware Fine-Tuning), a critic-and-likelihood-free framework that requires only a single forward pass per optimization step and eliminates auxiliary value networks. We identify that wider exploration spaces necessitate finer-grained, step-wise guid...

Submitted: March 4, 2026Subjects: Robotics; Robotics

Description / Details

Flow-based vision-language-action (VLA) models excel in embodied control but suffer from intractable likelihoods during multi-step sampling, hindering online reinforcement learning. We propose \textbf{\textit{π\boldsymbolπ-StepNFT}} (Step-wise Negative-aware Fine-Tuning), a critic-and-likelihood-free framework that requires only a single forward pass per optimization step and eliminates auxiliary value networks. We identify that wider exploration spaces necessitate finer-grained, step-wise guidance for alignment. Empirically, ππ-StepNFT unlocks latent potential on LIBERO with competitive few-shot robustness. Moreover, it achieves superior generalization on ManiSkill, outperforming value-based baselines in OOD scenarios by preventing overfitting to multimodal features. This property offers a scalable solution promising for complex real-world applications.


Source: arXiv:2603.02083v1 - http://arxiv.org/abs/2603.02083v1 PDF: https://arxiv.org/pdf/2603.02083v1 Original Link: http://arxiv.org/abs/2603.02083v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Mar 4, 2026
Topic:
Robotics
Area:
Robotics
Comments:
0
Bookmark
$π$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs | Researchia