Distributed Machine Learning based Mitigating Straggler in Big Data Environment

被引:1
|
作者
Lu, Haodong [1 ]
Wang, Kun [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Internet Things, Nanjing, Peoples R China
[2] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA USA
基金
中国国家自然科学基金;
关键词
Parameter Server; Straggler; Deep Reinforcement Learning;
D O I
10.1109/ICC42927.2021.9500531
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In big data era, utilizing the parameter server paradigm has been regarded as an efficient and practical way to improve performance in processing deep learning (DL) applications. One of the main problems is that straggler greatly hinders DL training progress, but the previous methods cannot fully consider the resource utilization of the cluster when dealing with straggler. To mitigate straggler problem in parameter server, we propose a Deep Reinforcement Learning (DRL)-based framework called Distributed Actor-critic Reinforcement Learning (DARL) that can automatically adapt each worker's training load to the dynamic cluster without parameter settings. DARL employs state-of-the-art techniques to stabilize training and improve convergence, including distributed framework, multiple actors and prioritized experience replay. Meanwhile, we also apply our customized experience sampling method to fully exploit potentially good samples. Experiments using real DL workloads show that DARL outperforms the representative Bulk Synchronous Parallel (BSP) scheme by 57.8% and Stale Synchronous Parallel (SSP) by 503% in terms of per-iteration time in heterogeneous environment.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Scalable malware detection system using big data and distributed machine learning approach
    Kumar, Manish
    SOFT COMPUTING, 2022, 26 (08) : 3987 - 4003
  • [42] SCHEDULING THE ALLOCATION OF DATA FRAGMENTS IN A DISTRIBUTED DATABASE ENVIRONMENT - A MACHINE LEARNING APPROACH
    CHATURVEDI, AR
    CHOUBEY, AK
    ROAN, JS
    IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 1994, 41 (02) : 194 - 207
  • [43] Spark Based Distributed Deep Learning Framework For Big Data Applications
    Khumoyun, Akhmedov
    Cui, Yun
    Hanku, Lee
    2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMMUNICATIONS TECHNOLOGIES (ICISCT), 2016,
  • [44] A MapReduce Based Distributed Framework for Similarity Search in Healthcare Big Data Environment
    Sarma, Hiren K. D.
    Dwivedi, Yogesh K.
    Rana, Nripendra P.
    Slade, Emma L.
    OPEN AND BIG DATA MANAGEMENT AND INNOVATION, I3E 2015, 2015, 9373 : 173 - 182
  • [45] An Improvement of a Checkpoint-based Distributed Testing Technique on a Big Data Environment
    Sudsee, Bhuridech
    Kaewkasi, Chanwit
    2019 21ST INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): ICT FOR 4TH INDUSTRIAL REVOLUTION, 2019, : 1081 - 1090
  • [46] Mitigating Straggler Effect in Federated Learning Based on Reconfigurable Intelligent Surface over Internet of Vehicles
    Li Zejun
    Wu Hao
    Lu Yunlong
    Dai Yueyue
    Ai Bo
    China Communications, 2024, 21 (08) : 62 - 78
  • [47] Mitigating Straggler Effect in Federated Learning Based on Reconfigurable Intelligent Surface over Internet of Vehicles
    Li, Zejun
    Wu, Hao
    Lu, Yunlong
    Dai, Yueyue
    Ai, Bo
    CHINA COMMUNICATIONS, 2024, 21 (08) : 62 - 78
  • [48] Security Threats and Defensive Approaches in Machine Learning System Under Big Data Environment
    Chen Hongsong
    Zhang Yongpeng
    Cao Yongrui
    Bharat Bhargava
    Wireless Personal Communications, 2021, 117 : 3505 - 3525
  • [49] Performance Evaluation of Machine Learning Classifiers for Stock Market Prediction in Big Data Environment
    Kalra, Sneh
    Gupta, Sachin
    Prasad, Jay Shankar
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (05): : 295 - 306
  • [50] Cyber forensics framework for big data analytics in IoT environment using machine learning
    Chhabra, Gurpal Singh
    Singh, Varinder Pal
    Singh, Maninder
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (23-24) : 15881 - 15900