Byzantine fault tolerance in distributed machine learning: a survey

被引:0
|
作者
Bouhata, Djamila [1 ,2 ]
Moumen, Hamouma [1 ,2 ]
Mazari, Jocelyn Ahmed [3 ,4 ]
Bounceur, Ahcene [5 ]
机构
[1] Univ Batna, Comp Sci Dept, 2 53 Constantine Rd, Batna 05078, Algeria
[2] Lab Applicat Math Comp & Elect, Comp Sci Dept, Batna, Algeria
[3] Sorbonne Univ, CNRS, ISIR, Paris, France
[4] Extrality, Paris, France
[5] Univ Sharjah, Informat Syst Dept, Sharjah, U Arab Emirates
关键词
Byzantine fault tolerance; distributed machine learning; stochastic gradient descent; communication; optimisation; SUBGRADIENT METHODS; COORDINATE DESCENT; GRADIENT DESCENT; AGREEMENT; GENERALS;
D O I
10.1080/0952813X.2024.2391778
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Byzantine Fault Tolerance (BFT) is crucial for ensuring the resilience of Distributed Machine Learning (DML) systems during training under adversarial conditions. Among the rising corpus of research on BFT in DML, there is no comprehensive classification of techniques or broad analysis of different approaches. This paper provides an in-depth survey of recent advancements in BFT for DML, with a focus on first-order optimisation methods, particularly, the popular one Stochastic Gradient Descent (SGD) during the training phase. We offer a novel classification of BFT approaches based on characteristics such as the communication process, optimisation method, and topology setting. This classification aims to enhance the understanding of various BFT methods and guide future research in addressing open challenges in the field. This work provides the foundations for developing robust BFT systems, using a variety of optimisation methods to strengthen resilience.
引用
收藏
页数:59
相关论文
共 50 条
  • [41] From distributed machine learning to federated learning: a survey
    Liu, Ji
    Huang, Jizhou
    Zhou, Yang
    Li, Xuhong
    Ji, Shilei
    Xiong, Haoyi
    Dou, Dejing
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (04) : 885 - 917
  • [43] SELF-LEARNING, HIGHLY-AVAILABLE METHODOLOGIES FOR BYZANTINE FAULT TOLERANCE
    Zhu, Yunyue
    Zhao, Xiaoxu
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING (ICACTE 2009), VOLS 1 AND 2, 2009, : 1267 - 1273
  • [44] Fault Tolerance in Iterative-Convergent Machine Learning
    Qiao, Aurick
    Aragam, Bryon
    Zhang, Bingjing
    Xing, Eric P.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [45] Fault and Noise Tolerance in the Incremental Extreme Learning Machine
    Leung, Ho Chun
    Leung, Chi Sing
    Wong, Eric Wing Ming
    IEEE ACCESS, 2019, 7 : 155171 - 155183
  • [46] From distributed machine to distributed deep learning: a comprehensive survey
    Dehghani, Mohammad
    Yazdanparast, Zahra
    JOURNAL OF BIG DATA, 2023, 10 (01)
  • [47] From distributed machine to distributed deep learning: a comprehensive survey
    Mohammad Dehghani
    Zahra Yazdanparast
    Journal of Big Data, 10
  • [48] Interaction Patterns for Byzantine Fault Tolerance Computing
    Chai, Hua
    Zhao, Wenbing
    COMPUTER APPLICATIONS FOR WEB, HUMAN COMPUTER INTERACTION, SIGNAL AND IMAGE PROCESSING AND PATTERN RECOGNITION, 2012, 342 : 180 - 188
  • [49] Byzantine Fault-Tolerance with Commutative Commands
    Raykov, Pavel
    Schiper, Nicolas
    Pedone, Fernando
    PRINCIPLES OF DISTRIBUTED SYSTEMS, 2011, 7109 : 329 - +
  • [50] Application-Aware Byzantine Fault Tolerance
    Zhao, Wenbing
    2014 IEEE 12TH INTERNATIONAL CONFERENCE ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING (DASC)/2014 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTING (EMBEDDEDCOM)/2014 IEEE 12TH INTERNATIONAL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING (PICOM), 2014, : 45 - 50