ExplorerRoboticsRobotics
Research PaperResearchia:202604.20074

VADF: Vision-Adaptive Diffusion Policy Framework for Efficient Robotic Manipulation

Xinglei Yu

Abstract

Diffusion policies are becoming mainstream in robotic manipulation but suffer from hard negative class imbalance due to uniform sampling and lack of sample difficulty awareness, leading to slow training convergence and frequent inference timeout failures. We propose VADF (Vision-Adaptive Diffusion Policy Framework), a vision-driven dual-adaptive framework that significantly reduces convergence steps and achieves early success in inference, with model-agnostic design enabling seamless integration...

Submitted: April 20, 2026Subjects: Robotics; Robotics

Description / Details

Diffusion policies are becoming mainstream in robotic manipulation but suffer from hard negative class imbalance due to uniform sampling and lack of sample difficulty awareness, leading to slow training convergence and frequent inference timeout failures. We propose VADF (Vision-Adaptive Diffusion Policy Framework), a vision-driven dual-adaptive framework that significantly reduces convergence steps and achieves early success in inference, with model-agnostic design enabling seamless integration into any diffusion policy architecture. During training, we introduce Adaptive Loss Network (ALN), a lightweight MLP-based loss predictor that quantifies per-step sample difficulty in real time. Guided by hard negative mining, it performs weighted sampling to prioritize high-loss regions, enabling adaptive weight updates and faster convergence. In inference, we design the Hierarchical Vision Task Segmenter (HVTS), which decomposes high-level task instructions into multi-stage low-level sub-instructions based on visual input. It adaptively segments action sequences into simple and complex subtasks by assigning shorter noise schedules with longer direct execution sequences to simple actions, and longer noise steps with shorter execution sequences to complex ones, thereby dramatically reducing computational overhead and significantly improving the early success rate.


Source: arXiv:2604.15938v1 - http://arxiv.org/abs/2604.15938v1 PDF: https://arxiv.org/pdf/2604.15938v1 Original Link: http://arxiv.org/abs/2604.15938v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Apr 20, 2026
Topic:
Robotics
Area:
Robotics
Comments:
0
Bookmark
VADF: Vision-Adaptive Diffusion Policy Framework for Efficient Robotic Manipulation | Researchia