Stochastic modified equations for the asynchronous stochastic gradient descent

被引:9
|
作者
An, Jing [1 ]
Lu, Jianfeng [2 ,3 ]
Ying, Lexing [4 ,5 ]
机构
[1] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA
[2] Duke Univ, Dept Math, Dept Chem, Box 90320, Durham, NC 27706 USA
[3] Duke Univ, Dept Phys, Box 90320, Durham, NC 27706 USA
[4] Stanford Univ, Dept Math, Stanford, CA 94305 USA
[5] Stanford Univ, Inst Computat & Math Engn ICME, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
stochastic modified equations; asynchronous stochastic gradient descent; optimal control;
D O I
10.1093/imaiai/iaz030
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We propose stochastic modified equations (SMEs) for modelling the asynchronous stochastic gradient descent (ASGD) algorithms. The resulting SME of Langevin type extracts more information about the ASGD dynamics and elucidates the relationship between different types of stochastic gradient algorithms. We show the convergence of ASGD to the SME in the continuous time limit, as well as the SME's precise prediction to the trajectories of ASGD with various forcing terms. As an application, we propose an optimal mini-batching strategy for ASGD via solving the optimal control problem of the associated SME.
引用
收藏
页码:851 / 873
页数:23
相关论文
共 50 条
  • [21] Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent
    De Sa, Christopher
    Feldman, Matthew
    Re, Christopher
    Olukotun, Kunle
    44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, : 561 - 574
  • [22] Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent
    Gess, Benjamin
    Kassing, Sebastian
    Konarovskyi, Vitalii
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [23] Stochastic Gradient Descent Variants for Corrupted Systems of Linear Equations
    Haddock, Jamie
    Needell, Deanna
    Rebrova, Elizaveta
    Swartworth, William
    2020 54TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2020, : 348 - 353
  • [24] Stochastic Gradient Descent Learns State Equations with Nonlinear Activations
    Oymak, Samet
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [25] Unforgeability in Stochastic Gradient Descent
    Baluta, Teodora
    Nikolic, Ivica
    Jain, Racchit
    Aggarwal, Divesh
    Saxena, Prateek
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1138 - 1152
  • [26] Preconditioned Stochastic Gradient Descent
    Li, Xi-Lin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) : 1454 - 1466
  • [27] Stochastic gradient descent tricks
    Bottou, Léon
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7700 LECTURE NO : 421 - 436
  • [28] Stochastic Reweighted Gradient Descent
    El Hanchi, Ayoub
    Stephens, David A.
    Maddison, Chris J.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [29] Byzantine Stochastic Gradient Descent
    Alistarh, Dan
    Allen-Zhu, Zeyuan
    Li, Jerry
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [30] An asynchronous distributed training algorithm based on Gossip communication and Stochastic Gradient Descent
    Tu, Jun
    Zhou, Jia
    Ren, Donglin
    COMPUTER COMMUNICATIONS, 2022, 195 : 416 - 423