ExplorerComputer VisionComputer Vision
Research PaperResearchia:202605.23005

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning

Lee Hsin-Ying

Abstract

Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially by missing secondary causal consequences. To address this, we introduce MotiMotion, a novel framework that reformulates motion control as a reasoning-then-generation problem. To encourage causally grounded and commonsense-consistent interactions, we leverage a traini...

Submitted: May 23, 2026Subjects: Computer Vision; Computer Vision

Description / Details

Current motion-controlled image-to-video generation models rigidly follow user-provided trajectories that are often sparse, imprecise, and causally incomplete. Such reliance often yields unnatural or implausible outcomes, especially by missing secondary causal consequences. To address this, we introduce MotiMotion, a novel framework that reformulates motion control as a reasoning-then-generation problem. To encourage causally grounded and commonsense-consistent interactions, we leverage a training-free vision-language reasoner to refine image-space coordinates of primary trajectories and to hallucinate plausible secondary motions. To further improve motion naturalness, we propose a confidence-aware control scheme that modulates guidance strength, enabling the model to closely follow high-confidence plans while correcting artifacts under low-confidence inputs with its internal generative priors. To support systematic evaluation, we curate a new image-to-video benchmark, MotiBench, consisting of interaction-centric scenes where new events are triggered by motion. Both VLM-based evaluation and a human study on MotiBench demonstrate that MotiMotion produces videos with more plausible object behaviors and interaction, and is preferred over existing approaches.


Source: arXiv:2605.22818v1 - http://arxiv.org/abs/2605.22818v1 PDF: https://arxiv.org/pdf/2605.22818v1 Original Link: http://arxiv.org/abs/2605.22818v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
May 23, 2026
Topic:
Computer Vision
Area:
Computer Vision
Comments:
0
Bookmark
MotiMotion: Motion-Controlled Video Generation with Visual Reasoning | Researchia