ExplorerComputer VisionComputer Vision
Research PaperResearchia:202604.28006

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

Weijie Wang

Abstract

Recent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a framework that aligns video generation with 3D constraints through reinforcement learning. To facilitate this alignment, we introduce a specialized pure text dataset tailored for world simulation. Utiliz...

Submitted: April 28, 2026Subjects: Computer Vision; Computer Vision

Description / Details

Recent video foundation models demonstrate impressive visual synthesis but frequently suffer from geometric inconsistencies. While existing methods attempt to inject 3D priors via architectural modifications, they often incur high computational costs and limit scalability. We propose World-R1, a framework that aligns video generation with 3D constraints through reinforcement learning. To facilitate this alignment, we introduce a specialized pure text dataset tailored for world simulation. Utilizing Flow-GRPO, we optimize the model using feedback from pre-trained 3D foundation models and vision-language models to enforce structural coherence without altering the underlying architecture. We further employ a periodic decoupled training strategy to balance rigid geometric consistency with dynamic scene fluidity. Extensive evaluations reveal that our approach significantly enhances 3D consistency while preserving the original visual quality of the foundation model, effectively bridging the gap between video generation and scalable world simulation.


Source: arXiv:2604.24764v1 - http://arxiv.org/abs/2604.24764v1 PDF: https://arxiv.org/pdf/2604.24764v1 Original Link: http://arxiv.org/abs/2604.24764v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Apr 28, 2026
Topic:
Computer Vision
Area:
Computer Vision
Comments:
0
Bookmark