OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality
Abstract
The Exponential Moving Average (EMA) is a cornerstone of widely used optimizers such as Adam. However, existing theoretical analyses of Adam-style methods have notable limitations: their guarantees can remain suboptimal in the zero-noise regime, rely on restrictive boundedness conditions (e.g., bounded gradients or objective gaps), use constant or open-loop stepsizes, or require prior knowledge of Lipschitz constants. To overcome these bottlenecks, we introduce OptEMA and analyze two novel variants: OptEMA-M, which applies an adaptive, decreasing EMA coefficient to the first-order moment with a fixed second-order decay, and OptEMA-V, which swaps these roles. Crucially, OptEMA is closed-loop and Lipschitz-free in the sense that its effective stepsizes are trajectory-dependent and do not require the Lipschitz constant for parameterization. Under standard stochastic gradient descent (SGD) assumptions, namely smoothness, a lower-bounded objective, and unbiased gradients with bounded variance, we establish rigorous convergence guarantees. Both variants achieve a noise-adaptive convergence rate of for the average gradient norm, where is the noise level. In particular, in the zero-noise regime where , our bounds reduce to the nearly optimal deterministic rate without manual hyperparameter retuning.
Source: arXiv:2603.09923v1 - http://arxiv.org/abs/2603.09923v1 PDF: https://arxiv.org/pdf/2603.09923v1 Original Link: http://arxiv.org/abs/2603.09923v1