Explorerβ€ΊMachine Learningβ€ΊMachine Learning
Research PaperResearchia:202601.12419642

Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

Huan Li

Abstract

This paper studies the AdamW-style Shampoo optimizer, an effective implementation of classical Shampoo that notably won the external tuning track of the AlgoPerf neural network training algorithm competition. Our analysis unifies one-sided and two-sided preconditioning and establishes the convergence rate $\frac{1}{K}\sum_{k=1}^K E\left[\|\nabla f(X_k)\|_\right]\leq O(\frac{\sqrt{m+n}C}{K^{1/4}})$ measured by nuclear norm, where $K$ represents the iteration number, $(m,n)$ denotes the size of ma...

Submitted: January 12, 2026Subjects: Machine Learning; Machine Learning

Description / Details

This paper studies the AdamW-style Shampoo optimizer, an effective implementation of classical Shampoo that notably won the external tuning track of the AlgoPerf neural network training algorithm competition. Our analysis unifies one-sided and two-sided preconditioning and establishes the convergence rate 1Kβˆ‘k=1KE[βˆ₯βˆ‡f(Xk)βˆ₯βˆ—]≀O(m+nCK1/4)\frac{1}{K}\sum_{k=1}^K E\left[\|\nabla f(X_k)\|_*\right]\leq O(\frac{\sqrt{m+n}C}{K^{1/4}}) measured by nuclear norm, where KK represents the iteration number, (m,n)(m,n) denotes the size of matrix parameters, and CC matches the constant in the optimal convergence rate of SGD. Theoretically, we have βˆ₯βˆ‡f(X)βˆ₯F≀βˆ₯βˆ‡f(X)βˆ₯βˆ—β‰€m+nβˆ₯βˆ‡f(X)βˆ₯F\|\nabla f(X)\|_F\leq \|\nabla f(X)\|_*\leq \sqrt{m+n}\|\nabla f(X)\|_F, supporting that our convergence rate can be considered to be analogous to the optimal 1Kβˆ‘k=1KE[βˆ₯βˆ‡f(Xk)βˆ₯F]≀O(CK1/4)\frac{1}{K}\sum_{k=1}^KE\left[\|\nabla f(X_k)\|_F\right]\leq O(\frac{C}{K^{1/4}}) convergence rate of SGD in the ideal case of βˆ₯βˆ‡f(X)βˆ₯βˆ—=Θ(m+n)βˆ₯βˆ‡f(X)βˆ₯F\|\nabla f(X)\|_*= Θ(\sqrt{m+n})\|\nabla f(X)\|_F.

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jan 12, 2026
Topic:
Machine Learning
Area:
Machine Learning
Comments:
0
Bookmark
Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning | Researchia