Random Reshuffling Dominates Stochastic Gradient Descent
Abstract
Stochastic Gradient Descent ($\textsf{SGD}$) is one of the most classical optimization algorithms with favorable theoretical guarantees, yet the practical implementation of $\textsf{SGD}$ differs subtly from its well-known form and is often referred to as Shuffling Stochastic Gradient Descent ($\textsf{Shuffling SGD}$). A particularly popular strategy in $\textsf{Shuffling SGD}$ is Random Reshuffling ($\textsf{RR}$), which has achieved great empirical success across numerous experiments. Despite...
Description / Details
Stochastic Gradient Descent () is one of the most classical optimization algorithms with favorable theoretical guarantees, yet the practical implementation of differs subtly from its well-known form and is often referred to as Shuffling Stochastic Gradient Descent (). A particularly popular strategy in is Random Reshuffling (), which has achieved great empirical success across numerous experiments. Despite its strong performance, has long been considered a heuristic due to a lack of theoretical support. Over the last decade, people have finally established provable convergence rates for , thus justifying its observed superiority. However, for smooth convex optimization, two clouds over the convergence theory of remain to this day. More precisely, according to the current theory, under converges only when the stepsize is smaller than a threshold proportional to , where is the number of summands in the objective (or the number of data points). Consequently, the optimally tuned theoretical rate of under is strictly worse than that of when the number of epochs is smaller than another threshold proportional to . These two restrictions heavily limit the applicability of existing theories and leave a critical mismatch with practice. In this work, for the first time, we prove that dominates in smooth convex optimization under any reasonable stepsize after any finite number of epochs, thereby addressing a longstanding open question.
Source: arXiv:2606.32005v1 - http://arxiv.org/abs/2606.32005v1 PDF: https://arxiv.org/pdf/2606.32005v1 Original Link: http://arxiv.org/abs/2606.32005v1
Please sign in to join the discussion.
No comments yet. Be the first to share your thoughts!
Jul 1, 2026
Mathematics
Mathematics
0