ExplorerData ScienceMachine Learning
Research PaperResearchia:202604.18063

Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

Come Fiegel

Abstract

We study the problem of learning minimax policies in zero-sum matrix games. Fiegel et al. (2025) recently showed that achieving last-iterate convergence in this setting is harder when the players are uncoupled, by proving a lower bound on the exploitability gap of Omega(t^{-1/4}). Some online mirror descent algorithms were proposed in the literature for this problem, but none have truly attained this rate yet. We show that the use of a log-barrier regularization, along with a dual-focused analys...

Submitted: April 18, 2026Subjects: Machine Learning; Data Science

Description / Details

We study the problem of learning minimax policies in zero-sum matrix games. Fiegel et al. (2025) recently showed that achieving last-iterate convergence in this setting is harder when the players are uncoupled, by proving a lower bound on the exploitability gap of Omega(t^{-1/4}). Some online mirror descent algorithms were proposed in the literature for this problem, but none have truly attained this rate yet. We show that the use of a log-barrier regularization, along with a dual-focused analysis, allows this O-tilde(t^{-1/4}) convergence with high-probability. We additionally extend our idea to the setting of extensive-form games, proving a bound with the same rate.


Source: arXiv:2604.15242v1 - http://arxiv.org/abs/2604.15242v1 PDF: https://arxiv.org/pdf/2604.15242v1 Original Link: http://arxiv.org/abs/2604.15242v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Apr 18, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier | Researchia