Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm

被引：0

作者：

Zhu, Miaoxi ^{[1
,2
]}

Shen, Li ^{[3
]}

Du, Bo ^{[1
,2
]}

Tao, Dacheng ^{[4
]}

机构：

[1] Wuhan Univ, Sch Comp Sci, Natl Engn Res Ctr Multimedia Software, Inst Artificial Intelligence, Wuhan, Peoples R China

[2] Wuhan Univ, Hubei Key Lab Multimedia & Network Commun Engn, Wuhan, Peoples R China

[3] JD Explore Acad, Beijing, Peoples R China

[4] Univ Sydney, Sydney, NSW, Australia

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The growing size of available data has attracted increasing interest in solving minimax problems in a decentralized manner for various machine learning tasks. Previous theoretical research has primarily focused on the convergence rate and communication complexity of decentralized minimax algorithms, with little attention given to their generalization. In this paper, we investigate the primal-dual generalization bound of the decentralized stochastic gradient descent ascent (D-SGDA) algorithm using the approach of algorithmic stability under both convex-concave and nonconvex-nonconcave settings. Our theory refines the algorithmic stability in a decentralized manner and demonstrates that the decentralized structure does not destroy the stability and generalization of D-SGDA, implying that it can generalize as well as the vanilla SGDA in certain situations. Our results analyze the impact of different topologies on the generalization bound of the D-SGDA algorithm beyond trivial factors such as sample sizes, learning rates, and iterations. We also evaluate the optimization error and balance it with the generalization gap to obtain the optimal population risk of D-SGDA in the convex-concave setting. Additionally, we perform several numerical experiments which validate our theoretical findings.

引用

页数：35

共 50 条

[31] The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Park, Daniel S.
Sohl-Dickstein, Jascha
Le, Quoc, V
Smith, Samuel L.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[32] Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
Cao, Yuan
Gu, Quanquan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[33] Data-Dependent Stability of Stochastic Gradient Descent
Kuzborskij, Ilja
Lampert, Christoph H.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[34] Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
Bassily, Raef
Feldman, Vitaly
Guzman, Cristobal
Talwar, Kunal
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[35] A new stochastic gradient descent possibilistic clustering algorithm
Koutsimpela, Angeliki
Koutroumbas, Konstantinos D.
AI COMMUNICATIONS, 2022, 35 (02) : 47 - 64
[36] Fast Convergence Stochastic Parallel Gradient Descent Algorithm
Hu Dongting
Shen Wen
Ma Wenchao
Liu Xinyu
Su Zhouping
Zhu Huaxin
Zhang Xiumei
Que Lizhi
Zhu Zhuowei
Zhang Yixin
Chen Guoqing
Hu Lifa
LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (12)
[37] Guided Stochastic Gradient Descent Algorithm for inconsistent datasets
Sharma, Anuraganand
APPLIED SOFT COMPUTING, 2018, 73 : 1068 - 1080
[38] Stochastic Approximate Gradient Descent via the Langevin Algorithm
Qiu, Yixuan
Wang, Xiao
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5428 - 5435
[39] A stochastic gradient descent algorithm for structural risk minimisation
Ratsaby, J
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2003, 2842 : 205 - 220
[40] The Improved Stochastic Fractional Order Gradient Descent Algorithm
Yang, Yang
Mo, Lipo
Hu, Yusen
Long, Fei
FRACTAL AND FRACTIONAL, 2023, 7 (08)

← 1 2 3 4 5 →