Distributed Machine Learning based Mitigating Straggler in Big Data Environment

被引:1
|
作者
Lu, Haodong [1 ]
Wang, Kun [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Internet Things, Nanjing, Peoples R China
[2] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA USA
基金
中国国家自然科学基金;
关键词
Parameter Server; Straggler; Deep Reinforcement Learning;
D O I
10.1109/ICC42927.2021.9500531
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In big data era, utilizing the parameter server paradigm has been regarded as an efficient and practical way to improve performance in processing deep learning (DL) applications. One of the main problems is that straggler greatly hinders DL training progress, but the previous methods cannot fully consider the resource utilization of the cluster when dealing with straggler. To mitigate straggler problem in parameter server, we propose a Deep Reinforcement Learning (DRL)-based framework called Distributed Actor-critic Reinforcement Learning (DARL) that can automatically adapt each worker's training load to the dynamic cluster without parameter settings. DARL employs state-of-the-art techniques to stabilize training and improve convergence, including distributed framework, multiple actors and prioritized experience replay. Meanwhile, we also apply our customized experience sampling method to fully exploit potentially good samples. Experiments using real DL workloads show that DARL outperforms the representative Bulk Synchronous Parallel (BSP) scheme by 57.8% and Stale Synchronous Parallel (SSP) by 503% in terms of per-iteration time in heterogeneous environment.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Granular computing based machine learning in the era of big data
    Hu, Qinghua
    Mi, Jusheng
    Chen, Degang
    Information Sciences, 2022, 591 : 422 - 423
  • [32] Big Data Analysis of TV Dramas Based on Machine Learning
    Tan, Jiaqi
    Mao, Feiqiao
    Yang, Lianghai
    Wang, Jiahui
    SMART COMPUTING AND COMMUNICATION, SMARTCOM 2017, 2018, 10699 : 90 - 95
  • [33] Towards Mitigating Straggler with Deep Reinforcement Learning in Parameter Server
    Lu, Haodong
    Wang, Kun
    2020 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC), 2020, : 829 - 834
  • [34] Modelling of healthcare data analytics using optimal machine learning model in big data environment
    Fancy, Chelladurai
    Krishnaraj, Nagappan
    Ishwarya, K.
    Raja, G.
    Chandrasekaran, Shyamala
    EXPERT SYSTEMS, 2025, 42 (01)
  • [35] Machine learning for big data analytics
    Oja, E. (erkki.oja@aalto.fi), 1600, Springer Verlag (384):
  • [36] Big data and machine learning in health
    Carvalho, D.
    Cruz, R.
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2020, 30 : 10 - 11
  • [37] Machine learning and big scientific data
    Hey, Tony
    Butler, Keith
    Jackson, Sam
    Thiyagalingam, Jeyarajan
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2020, 378 (2166):
  • [38] Machine Learning under Big Data
    Shi, Chunhe
    Wu, Chengdong
    Han, Xiaowei
    Xie, Yinghong
    Li, Zhen
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ELECTRONIC, MECHANICAL, INFORMATION AND MANAGEMENT SOCIETY (EMIM), 2016, 40 : 301 - 305
  • [39] Machine learning, big data, and neuroscience
    Pillow, Jonathan
    Sahani, Maneesh
    CURRENT OPINION IN NEUROBIOLOGY, 2019, 55 : III - IV
  • [40] Scalable malware detection system using big data and distributed machine learning approach
    Manish Kumar
    Soft Computing, 2022, 26 : 3987 - 4003