ExplorerRoboticsRobotics
Research PaperResearchia:202605.11091

NoiseGate: Learning Per-Latent Timestep Schedules as Information Gating in World Action Models

Wen Huang

Abstract

World Action Models (WAMs) are an emerging family of policies that tie robot action generation to future-observation modeling. In this work, we focus on the joint video--action modeling paradigm, where actions and imagined future observations are co-generated along a shared denoising or flow trajectory, so that perception, prediction, and control are coupled within one generative process. Existing WAMs typically realize this paradigm with a Mixture-of-Transformers (MoT), where video and action t...

Submitted: May 11, 2026Subjects: Robotics; Robotics

Description / Details

World Action Models (WAMs) are an emerging family of policies that tie robot action generation to future-observation modeling. In this work, we focus on the joint video--action modeling paradigm, where actions and imagined future observations are co-generated along a shared denoising or flow trajectory, so that perception, prediction, and control are coupled within one generative process. Existing WAMs typically realize this paradigm with a Mixture-of-Transformers (MoT), where video and action tokens interact through shared self-attention. This architecture can in principle assign a separate timestep tft_f to each predicted latent frame, yet current systems collapse this degree of freedom onto a single shared scalar tt. Under the noise-as-masking view of Diffusion Forcing, this shared schedule imposes the unjustified prior that every predicted latent is equally reliable for action generation. We instead view the per-latent schedule as a \emph{learnable information-gating policy}: by changing a latent frame's noise level, the policy modulates the reliability of its Key/Value contribution to the action tokens. We propose \textbf{NoiseGate}, which combines independent per-latent timestep sampling during backbone training, a lightweight Gating Policy Network that emits per-latent time increments during denoising, and task-reward optimization that trains the schedule policy without hand-crafted shape priors. Built on a joint video--action MoT backbone, NoiseGate delivers consistent gains on diverse RoboTwin random-scene manipulation tasks.


Source: arXiv:2605.07794v1 - http://arxiv.org/abs/2605.07794v1 PDF: https://arxiv.org/pdf/2605.07794v1 Original Link: http://arxiv.org/abs/2605.07794v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 11, 2026
Topic:
Robotics
Area:
Robotics
Comments:
0
Bookmark
NoiseGate: Learning Per-Latent Timestep Schedules as Information Gating in World Action Models | Researchia