ExplorerArtificial IntelligenceArtificial Intelligence
Research PaperResearchia:202601.29022

Latent Adversarial Regularization for Offline Preference Optimization

Enyi Jiang

Abstract

Learning from human feedback typically relies on preference optimization that constrains policy updates through token-level regularization. However, preference optimization for language models is particularly challenging because token-space similarity does not imply semantic or behavioral similarity. To address this challenge, we leverage latent-space regularization for language model preference optimization. We introduce GANPO, which achieves latent-space regularization by penalizing divergence...

Submitted: January 29, 2026Subjects: Artificial Intelligence; Artificial Intelligence

Description / Details

Learning from human feedback typically relies on preference optimization that constrains policy updates through token-level regularization. However, preference optimization for language models is particularly challenging because token-space similarity does not imply semantic or behavioral similarity. To address this challenge, we leverage latent-space regularization for language model preference optimization. We introduce GANPO, which achieves latent-space regularization by penalizing divergence between the internal representations of a policy model and a reference model. Given that latent representations are not associated with explicit probability densities, we adopt an adversarial approach inspired by GANs to minimize latent-space divergence. We integrate GANPO as a regularizer into existing offline preference optimization objectives. Experiments across multiple model architectures and tasks show consistent improvements from latent-space regularization. Further, by comparing GANPO-induced inferential biases with those from token-level regularization, we find that GANPO provides more robust structural feedback under distributional shift and noise while maintaining comparable downstream performance with minor computational overhead.


Source: arXiv:2601.22083v1 - http://arxiv.org/abs/2601.22083v1 PDF: https://arxiv.org/pdf/2601.22083v1 Original Link: http://arxiv.org/abs/2601.22083v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jan 29, 2026
Topic:
Artificial Intelligence
Area:
Artificial Intelligence
Comments:
0
Bookmark