Back to Explorer
Research PaperResearchia:202601.30010[Data Science > Machine Learning]

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

Hongyang Du

Abstract

While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulting in object deformation or spatial drift. We hypothesize that these failures arise because standard denoising objectives lack explicit incentives for geometric coherence. To address this, we introduce VideoGPA (Video Geometric Preference Alignment), a data-efficient self-supervised framework that leverages a geometry foundation model to automatically derive dense preference signals that guide VDMs via Direct Preference Optimization (DPO). This approach effectively steers the generative distribution toward inherent 3D consistency without requiring human annotations. VideoGPA significantly enhances temporal stability, physical plausibility, and motion coherence using minimal preference pairs, consistently outperforming state-of-the-art baselines in extensive experiments.


Source: arXiv:2601.23286v1 - http://arxiv.org/abs/2601.23286v1 PDF: https://arxiv.org/pdf/2601.23286v1 Original Article: View on arXiv

Submission:1/30/2026
Comments:0 comments
Subjects:Machine Learning; Data Science
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation | Researchia | Researchia