Research PaperResearchia:202601.12419642[Machine Learning > Machine Learning]
Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning
Huan Li
Abstract
This paper studies the AdamW-style Shampoo optimizer, an effective implementation of classical Shampoo that notably won the external tuning track of the AlgoPerf neural network training algorithm competition. Our analysis unifies one-sided and two-sided preconditioning and establishes the convergence rate measured by nuclear norm, where represents the iteration number, denotes the size of matrix parameters, and matches the constant in the optimal convergence rate of SGD. Theoretically, we have , supporting that our convergence rate can be considered to be analogous to the optimal convergence rate of SGD in the ideal case of .
Submission:1/12/2026
Comments:0 comments
Subjects:Machine Learning; Machine Learning
Cite as:
Researchia:202601.12419642https://www.researchia.net/explorer/d329e554-436c-4b90-a5da-5958768895af
Original Source:
Was this helpful?