Back to Explorer
Research PaperResearchia:202601.29022[Artificial Intelligence > Artificial Intelligence]

Latent Adversarial Regularization for Offline Preference Optimization

Enyi Jiang

Abstract

Learning from human feedback typically relies on preference optimization that constrains policy updates through token-level regularization. However, preference optimization for language models is particularly challenging because token-space similarity does not imply semantic or behavioral similarity. To address this challenge, we leverage latent-space regularization for language model preference optimization. We introduce GANPO, which achieves latent-space regularization by penalizing divergence between the internal representations of a policy model and a reference model. Given that latent representations are not associated with explicit probability densities, we adopt an adversarial approach inspired by GANs to minimize latent-space divergence. We integrate GANPO as a regularizer into existing offline preference optimization objectives. Experiments across multiple model architectures and tasks show consistent improvements from latent-space regularization. Further, by comparing GANPO-induced inferential biases with those from token-level regularization, we find that GANPO provides more robust structural feedback under distributional shift and noise while maintaining comparable downstream performance with minor computational overhead.


Source: arXiv:2601.22083v1 - http://arxiv.org/abs/2601.22083v1 PDF: https://arxiv.org/pdf/2601.22083v1 Original Link: http://arxiv.org/abs/2601.22083v1

Submission:1/29/2026
Comments:0 comments
Subjects:Artificial Intelligence; Artificial Intelligence
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Latent Adversarial Regularization for Offline Preference Optimization | Researchia