Back to Explorer
Research PaperResearchia:202603.04032[Mathematics > Mathematics]

Adam Converges Without Any Modification On Update Rules

Yushun Zhang

Abstract

Adam is the default algorithm for training neural networks, including large language models (LLMs). However, \citet{reddi2019convergence} provided an example that Adam diverges, raising concerns for its deployment in AI model training. We identify a key mismatch between the divergence example and practice: \citet{reddi2019convergence} pick the problem after picking the hyperparameters of Adam, i.e., (β1,β2)(β_1,β_2); while practical applications often fix the problem first and then tune (β1,β2)(β_1,β_2). In this work, we prove that Adam converges with proper problem-dependent hyperparameters. First, we prove that Adam converges when β2β_2 is large and β1<β2β_1 < \sqrt{β_2}. Second, when β2β_2 is small, we point out a region of (β1,β2)(β_1,β_2) combinations where Adam can diverge to infinity. Our results indicate a phase transition for Adam from divergence to convergence when changing the (β1,β2)(β_1, β_2) combination. To our knowledge, this is the first phase transition in (β1,β2)(β_1,β_2) 2D-plane reported in the literature, providing rigorous theoretical guarantees for Adam optimizer. We further point out that the critical boundary (β1,β2)(β_1^*, β_2^*) is problem-dependent, and particularly, dependent on batch size. This provides suggestions on how to tune β1β_1 and β2β_2: when Adam does not work well, we suggest tuning up β2β_2 inversely with batch size to surpass the threshold β2β_2^*, and then trying β1<β2β_1< \sqrt{β_2}. Our suggestions are supported by reports from several empirical studies, which observe improved LLM training performance when applying them.


Source: arXiv:2603.02092v1 - http://arxiv.org/abs/2603.02092v1 PDF: https://arxiv.org/pdf/2603.02092v1 Original Link: http://arxiv.org/abs/2603.02092v1

Submission:3/4/2026
Comments:0 comments
Subjects:Mathematics; Mathematics
Original Source:
View Original PDF
arXiv: This paper is hosted on arXiv, an open-access repository
Was this helpful?

Discussion (0)

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Adam Converges Without Any Modification On Update Rules | Researchia | Researchia