ExplorerData ScienceMachine Learning
Research PaperResearchia:202606.08004

Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization

Ming Sun

Abstract

Decentralized stochastic optimization is a fundamental paradigm for large-scale learning over networks, where agents communicate only with their neighbors and no central coordinator is required. For strongly convex problems, communication efficiency is mainly determined by the condition number \(κ=L/μ\) and the network spectral gap \(1-β\). Although deterministic decentralized methods can simultaneously achieve accelerated \(\sqrtκ\) and \(1/\sqrt{1-β}\) dependences, no existing stochastic metho...

Submitted: June 8, 2026Subjects: Machine Learning; Data Science

Description / Details

Decentralized stochastic optimization is a fundamental paradigm for large-scale learning over networks, where agents communicate only with their neighbors and no central coordinator is required. For strongly convex problems, communication efficiency is mainly determined by the condition number (κ=L/μ) and the network spectral gap (1-β). Although deterministic decentralized methods can simultaneously achieve accelerated (\sqrtκ) and (1/\sqrt{1-β}) dependences, no existing stochastic method attains both improvements at once. In this paper, we propose \emph{Multi-Gossip Accelerated DSGD} (MG-ADSGD), a decentralized stochastic algorithm that combines Nesterov-type primal--dual extrapolation with multi-round fast gossip averaging. The key idea is to couple the gossip depth with the mini-batch size so that additional communication rounds simultaneously improve consensus accuracy and reduce gradient variance. We show that MG-ADSGD achieves the communication complexity [ \widetilde{\mathcal O}!\left( \frac{σ^2}{μnε}\log\frac{1}ε + \sqrt{\fracκ{1-β}}\log\frac{1}ε \right), ] where (ε) denotes the target accuracy, (n) is the number of nodes, and (σ^2) is the gradient variance. To the best of our knowledge, this bound yields the best currently available communication complexity for decentralized stochastic strongly convex optimization, up to logarithmic factors that are independent of εε.


Source: arXiv:2606.07496v1 - http://arxiv.org/abs/2606.07496v1 PDF: https://arxiv.org/pdf/2606.07496v1 Original Link: http://arxiv.org/abs/2606.07496v1

Please sign in to join the discussion.

No comments yet. Be the first to share your thoughts!

Access Paper
View Source PDF
Submission Info
Date:
Jun 8, 2026
Topic:
Data Science
Area:
Machine Learning
Comments:
0
Bookmark
Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization | Researchia