Byzantine fault tolerance in distributed machine learning: a survey

被引:0
|
作者
Bouhata, Djamila [1 ,2 ]
Moumen, Hamouma [1 ,2 ]
Mazari, Jocelyn Ahmed [3 ,4 ]
Bounceur, Ahcene [5 ]
机构
[1] Univ Batna, Comp Sci Dept, 2 53 Constantine Rd, Batna 05078, Algeria
[2] Lab Applicat Math Comp & Elect, Comp Sci Dept, Batna, Algeria
[3] Sorbonne Univ, CNRS, ISIR, Paris, France
[4] Extrality, Paris, France
[5] Univ Sharjah, Informat Syst Dept, Sharjah, U Arab Emirates
关键词
Byzantine fault tolerance; distributed machine learning; stochastic gradient descent; communication; optimisation; SUBGRADIENT METHODS; COORDINATE DESCENT; GRADIENT DESCENT; AGREEMENT; GENERALS;
D O I
10.1080/0952813X.2024.2391778
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Byzantine Fault Tolerance (BFT) is crucial for ensuring the resilience of Distributed Machine Learning (DML) systems during training under adversarial conditions. Among the rising corpus of research on BFT in DML, there is no comprehensive classification of techniques or broad analysis of different approaches. This paper provides an in-depth survey of recent advancements in BFT for DML, with a focus on first-order optimisation methods, particularly, the popular one Stochastic Gradient Descent (SGD) during the training phase. We offer a novel classification of BFT approaches based on characteristics such as the communication process, optimisation method, and topology setting. This classification aims to enhance the understanding of various BFT methods and guide future research in addressing open challenges in the field. This work provides the foundations for developing robust BFT systems, using a variety of optimisation methods to strengthen resilience.
引用
收藏
页数:59
相关论文
共 50 条
  • [31] Dynamic Practical Byzantine Fault Tolerance
    Xu Hao
    Long Yu
    Liu Zhiqiang
    Liu Zhen
    Gu Dawu
    2018 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2018,
  • [32] High throughput Byzantine Fault Tolerance
    Kotla, R
    Dahlin, M
    2004 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2004, : 575 - 584
  • [33] Zyzzyva: Speculative Byzantine Fault Tolerance
    Kotla, Ramakrishna
    Alvisi, Lorenzo
    Dahlin, Mike
    Clement, Allen
    Wong, Edmund
    ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2009, 27 (04):
  • [34] Quorum Selection for Byzantine Fault Tolerance
    Jehl, Leander
    2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 2168 - 2177
  • [35] Zyzzyva: Speculative Byzantine Fault Tolerance
    Kotla, Ramakrishna
    Clement, Allen
    Wong, Edmund
    Alvisi, Lorenzo
    Dahlin, Mike
    COMMUNICATIONS OF THE ACM, 2008, 51 (11) : 86 - 95
  • [36] Zyzzyva: Speculative byzantine fault tolerance
    Kotla, Ramakrishna
    Alvisi, Lorenzo
    Dahlin, Mike
    Clement, Allen
    Wong, Edmund
    Operating Systems Review (ACM), 2007, : 45 - 58
  • [37] Switch-Centric Byzantine Fault Tolerance Mechanism in Distributed Software Defined Networks
    Han, Sol
    Jang, Seokwon
    Lee, Hochan
    Pack, Sangheon
    IEEE COMMUNICATIONS LETTERS, 2020, 24 (10) : 2236 - 2239
  • [38] Strengthened Fault Tolerance in Byzantine Fault Tolerant Replication
    Xiang, Zhuolun
    Malkhi, Dahlia
    Nayak, Kartik
    Ren, Ling
    2021 IEEE 41ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2021), 2021, : 205 - 215
  • [39] Analysis of the fault tolerance of a switched reluctance machine with distributed inverter
    Hennen, Martin D.
    Boesing, Matthias
    de Doncker, Rik W.
    World Electric Vehicle Journal, 2012, 5 (02): : 482 - 493
  • [40] From distributed machine learning to federated learning: a survey
    Ji Liu
    Jizhou Huang
    Yang Zhou
    Xuhong Li
    Shilei Ji
    Haoyi Xiong
    Dejing Dou
    Knowledge and Information Systems, 2022, 64 : 885 - 917