Back to Explorer
Research PaperResearchia:202602.24045[AI Agents > AI]

Mean-Field Reinforcement Learning without Synchrony

Shan Yang

Abstract

Mean-field reinforcement learning (MF-RL) scales multi-agent RL to large populations by reducing each agent's dependence on others to a single summary statistic -- the mean action. However, this reduction requires every agent to act at every time step; when some agents are idle, the mean action is simply undefined. Addressing asynchrony therefore requires a different summary statistic -- one that remains defined regardless of which agents act. The population distribution μΔ(O)μ\in Δ(\mathcal{O}) -- the fraction of agents at each observation -- satisfies this requirement: its dimension is independent of NN, and under exchangeability it fully determines each agent's reward and transition. Existing MF-RL theory, however, is built on the mean action and does not extend to μμ. We therefore construct the Temporal Mean Field (TMF) framework around the population distribution μμ from scratch, covering the full spectrum from fully synchronous to purely sequential decision-making within a single theory. We prove existence and uniqueness of TMF equilibria, establish an O(1/N)O(1/\sqrt{N}) finite-population approximation bound that holds regardless of how many agents act per step, and prove convergence of a policy gradient algorithm (TMF-PG) to the unique equilibrium. Experiments on a resource selection game and a dynamic queueing game confirm that TMF-PG achieves near-identical performance whether one agent or all NN act per step, with approximation error decaying at the predicted O(1/N)O(1/\sqrt{N}) rate.


Source: arXiv:2602.18026v1 - http://arxiv.org/abs/2602.18026v1 PDF: https://arxiv.org/pdf/2602.18026v1 Original Link: http://arxiv.org/abs/2602.18026v1

Submission:2/24/2026
Comments:0 comments
Subjects:AI; AI Agents
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Mean-Field Reinforcement Learning without Synchrony | Researchia