ExplorerData ScienceMachine Learning
Research PaperResearchia:202602.14036

Diffusion Alignment Beyond KL: Variance Minimisation as Effective Policy Optimiser

Zijing Ou

Abstract

Diffusion alignment adapts pretrained diffusion models to sample from reward-tilted distributions along the denoising trajectory. This process naturally admits a Sequential Monte Carlo (SMC) interpretation, where the denoising model acts as a proposal and reward guidance induces importance weights. Motivated by this view, we introduce Variance Minimisation Policy Optimisation (VMPO), which formulates diffusion alignment as minimising the variance of log importance weights rather than directly op...

Submitted: February 14, 2026Subjects: Machine Learning; Data Science

Description / Details

Diffusion alignment adapts pretrained diffusion models to sample from reward-tilted distributions along the denoising trajectory. This process naturally admits a Sequential Monte Carlo (SMC) interpretation, where the denoising model acts as a proposal and reward guidance induces importance weights. Motivated by this view, we introduce Variance Minimisation Policy Optimisation (VMPO), which formulates diffusion alignment as minimising the variance of log importance weights rather than directly optimising a Kullback-Leibler (KL) based objective. We prove that the variance objective is minimised by the reward-tilted target distribution and that, under on-policy sampling, its gradient coincides with that of standard KL-based alignment. This perspective offers a common lens for understanding diffusion alignment. Under different choices of potential functions and variance minimisation strategies, VMPO recovers various existing methods, while also suggesting new design directions beyond KL.


Source: arXiv:2602.12229v1 - http://arxiv.org/abs/2602.12229v1 PDF: https://arxiv.org/pdf/2602.12229v1 Original Link: http://arxiv.org/abs/2602.12229v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Feb 14, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark