ExplorerArtificial IntelligenceAI
Research PaperResearchia:202602.24077

Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies

Zhuoran Li

Abstract

Online Multi-Agent Reinforcement Learning (MARL) is a prominent framework for efficient agent coordination. Crucially, enhancing policy expressiveness is pivotal for achieving superior performance. Diffusion-based generative models are well-positioned to meet this demand, having demonstrated remarkable expressiveness and multimodal representation in image generation and offline settings. Yet, their potential in online MARL remains largely under-explored. A major obstacle is that the intractable ...

Submitted: February 24, 2026Subjects: AI; Artificial Intelligence

Description / Details

Online Multi-Agent Reinforcement Learning (MARL) is a prominent framework for efficient agent coordination. Crucially, enhancing policy expressiveness is pivotal for achieving superior performance. Diffusion-based generative models are well-positioned to meet this demand, having demonstrated remarkable expressiveness and multimodal representation in image generation and offline settings. Yet, their potential in online MARL remains largely under-explored. A major obstacle is that the intractable likelihoods of diffusion models impede entropy-based exploration and coordination. To tackle this challenge, we propose among the first \underline{O}nline off-policy \underline{MA}RL framework using \underline{D}iffusion policies (\textbf{OMAD}) to orchestrate coordination. Our key innovation is a relaxed policy objective that maximizes scaled joint entropy, facilitating effective exploration without relying on tractable likelihood. Complementing this, within the centralized training with decentralized execution (CTDE) paradigm, we employ a joint distributional value function to optimize decentralized diffusion policies. It leverages tractable entropy-augmented targets to guide the simultaneous updates of diffusion policies, thereby ensuring stable coordination. Extensive evaluations on MPE and MAMuJoCo establish our method as the new state-of-the-art across 1010 diverse tasks, demonstrating a remarkable 2.5×2.5\times to 5×5\times improvement in sample efficiency.


Source: arXiv:2602.18291v1 - http://arxiv.org/abs/2602.18291v1 PDF: https://arxiv.org/pdf/2602.18291v1 Original Link: http://arxiv.org/abs/2602.18291v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Feb 24, 2026
Topic:
Artificial Intelligence
Area:
AI
Comments:
0
Bookmark