Distributed Machine Learning based Mitigating Straggler in Big Data Environment

被引:1
|
作者
Lu, Haodong [1 ]
Wang, Kun [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Internet Things, Nanjing, Peoples R China
[2] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA USA
基金
中国国家自然科学基金;
关键词
Parameter Server; Straggler; Deep Reinforcement Learning;
D O I
10.1109/ICC42927.2021.9500531
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In big data era, utilizing the parameter server paradigm has been regarded as an efficient and practical way to improve performance in processing deep learning (DL) applications. One of the main problems is that straggler greatly hinders DL training progress, but the previous methods cannot fully consider the resource utilization of the cluster when dealing with straggler. To mitigate straggler problem in parameter server, we propose a Deep Reinforcement Learning (DRL)-based framework called Distributed Actor-critic Reinforcement Learning (DARL) that can automatically adapt each worker's training load to the dynamic cluster without parameter settings. DARL employs state-of-the-art techniques to stabilize training and improve convergence, including distributed framework, multiple actors and prioritized experience replay. Meanwhile, we also apply our customized experience sampling method to fully exploit potentially good samples. Experiments using real DL workloads show that DARL outperforms the representative Bulk Synchronous Parallel (BSP) scheme by 57.8% and Stale Synchronous Parallel (SSP) by 503% in terms of per-iteration time in heterogeneous environment.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Machine Learning Research in Big Data Environment
    Jiang, Shi
    2018 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2018), 2018, : 227 - 231
  • [2] Strategies and Principles of Distributed Machine Learning on Big Data
    Xing, Eric P.
    Ho, Qirong
    Xie, Pengtao
    Wei, Dai
    ENGINEERING, 2016, 2 (02) : 179 - 195
  • [3] Seasonal Tourism Demand Forecasting Based on Machine Learning in Big Data Environment
    Li, Jing
    Cao, Bin
    Journal of Network Intelligence, 2024, 9 (02): : 1032 - 1045
  • [4] Distributed Weighted Extreme Learning Machine for Big Imbalanced Data Learning
    Wang, Zhiqiong
    Xin, Junchang
    Tian, Shuo
    Yu, Ge
    PROCEEDINGS OF ELM-2015, VOL 1: THEORY, ALGORITHMS AND APPLICATIONS (I), 2016, 6 : 319 - 332
  • [5] Distributed and Weighted Extreme Learning Machine for Imbalanced Big Data Learning
    Zhiqiong Wang
    Junchang Xin
    Hongxu Yang
    Shuo Tian
    Ge Yu
    Chenren Xu
    Yudong Yao
    Tsinghua Science and Technology, 2017, 22 (02) : 160 - 173
  • [6] Distributed and Weighted Extreme Learning Machine for Imbalanced Big Data Learning
    Wang, Zhiqiong
    Xin, Junchang
    Yang, Hongxu
    Tian, Shuo
    Yu, Ge
    Xu, Chenren
    Yao, Yudong
    TSINGHUA SCIENCE AND TECHNOLOGY, 2017, 22 (02) : 160 - 173
  • [7] Protecting Machine Learning Integrity in Distributed Big Data Networking
    Wei, Yunkai
    Chen, Yijin
    Xiao, Mingyue
    Maharjan, Sabita
    Zhang, Yan
    IEEE NETWORK, 2020, 34 (04): : 84 - 90
  • [8] Petuum: A New Platform for Distributed Machine Learning on Big Data
    Xing, Eric P.
    Ho, Qirong
    Dai, Wei
    Kim, Jin Kyu
    Wei, Jinliang
    Lee, Seunghak
    Zheng, Xun
    Xie, Pengtao
    Kumar, Abhimanu
    Yu, Yaoliang
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 1335 - 1344
  • [9] A Survey of Distributed and Parallel Extreme Learning Machine for Big Data
    Wang, Zhiqiong
    Sui, Ling
    Xin, Junchang
    Qu, Luxuan
    Yao, Yudong
    IEEE ACCESS, 2020, 8 : 201247 - 201258
  • [10] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219