Distributed Machine Learning based Mitigating Straggler in Big Data Environment

被引:1
|
作者
Lu, Haodong [1 ]
Wang, Kun [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Internet Things, Nanjing, Peoples R China
[2] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA USA
基金
中国国家自然科学基金;
关键词
Parameter Server; Straggler; Deep Reinforcement Learning;
D O I
10.1109/ICC42927.2021.9500531
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In big data era, utilizing the parameter server paradigm has been regarded as an efficient and practical way to improve performance in processing deep learning (DL) applications. One of the main problems is that straggler greatly hinders DL training progress, but the previous methods cannot fully consider the resource utilization of the cluster when dealing with straggler. To mitigate straggler problem in parameter server, we propose a Deep Reinforcement Learning (DRL)-based framework called Distributed Actor-critic Reinforcement Learning (DARL) that can automatically adapt each worker's training load to the dynamic cluster without parameter settings. DARL employs state-of-the-art techniques to stabilize training and improve convergence, including distributed framework, multiple actors and prioritized experience replay. Meanwhile, we also apply our customized experience sampling method to fully exploit potentially good samples. Experiments using real DL workloads show that DARL outperforms the representative Bulk Synchronous Parallel (BSP) scheme by 57.8% and Stale Synchronous Parallel (SSP) by 503% in terms of per-iteration time in heterogeneous environment.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Research Progress and Perspectives of Quantum Machine Learning in Big Data Environment
    Zhang S.
    Huang X.
    Chang Y.
    Yan L.
    Cheng W.
    Zhang, Shibin (cuitzsb@cuit.edu.cn); Zhang, Shibin (cuitzsb@cuit.edu.cn), 1600, Univ. of Electronic Science and Technology of China (50): : 802 - 819
  • [22] Machine Learning in Big Data
    Wang, Lidong
    Alexander, Cheryl Ann
    INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2016, 1 (02) : 52 - 61
  • [23] Machine Learning on Big Data
    Condie, Tyson
    Mineiro, Paul
    Polyzotis, Neoklis
    Weimer, Markus
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 1242 - 1244
  • [24] Dynamic Distributed and Parallel Machine Learning algorithms for big data mining processing
    Djafri, Laouni
    DATA TECHNOLOGIES AND APPLICATIONS, 2022, 56 (04) : 558 - 601
  • [25] A Distributed Environment Decision Maker based on Machine Learning Techniques
    de Oliveira, Edvard M.
    Cezar Estrella, Julio
    Reiff-Marganiec, Stephan
    2017 11TH IEEE SYMPOSIUM ON SERVICE-ORIENTED SYSTEM ENGINEERING (SOSE), 2017, : 108 - 113
  • [26] Ontology-based Recommender for Distributed Machine Learning Environment
    Pop, Daniel
    Bogdanescu, Caius
    2013 15TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2013), 2014, : 537 - 542
  • [27] Blockchain based Approach to Enhance Big Data Authentication in Distributed Environment
    Abdullah, Nazri
    Hakansson, Anne
    Moradian, Esmiralda
    2017 NINTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN 2017), 2017, : 887 - 892
  • [28] BIG DATA, MACHINE LEARNING AND ENVIRONMENTAL PRESERVATION: TECHNOLOGICAL INSTRUMENTS IN DEFENSE OF THE ENVIRONMENT
    Molinaro, Carlos Alberto
    Leal, Augusto Fontanive
    VEREDAS DO DIREITO, 2018, 15 (31): : 201 - 224
  • [29] A precipitation forecasting model using machine learning on big data in clouds environment
    Alam, Mahboob
    Amjad, Mohd
    MAUSAM, 2021, 72 (04): : 781 - 790
  • [30] Intrusion detection model using machine learning algorithm on Big Data environment
    Othman, Suad Mohammed
    Ba-Alwi, Fadl Mutaher
    Alsohybe, Nabeel T.
    Al-Hashida, Amal Y.
    JOURNAL OF BIG DATA, 2018, 5 (01)