Strategies and Principles of Distributed Machine Learning on Big Data

被引:99
|
作者
Xing, Eric P. [1 ]
Ho, Qirong [1 ]
Xie, Pengtao [1 ]
Wei, Dai [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
关键词
Machine learning; Artificial intelligence big data; Big model; Distributed systems; Principles; Theory; Data-parallelism; Model-parallelism; REGRESSION; MODEL; SELECTION;
D O I
10.1016/J.ENG.2016.02.008
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The rise of big data has led to new demands for machine learning (ML) systems to learn complex models, with millions to billions of parameters, that promise adequate capacity to digest massive datasets and offer powerful predictive analytics (such as high-dimensional latent features, intermediate representations, and decision functions) thereupon. In order to run ML algorithms at such scales, on a distributed cluster with tens to thousands of machines, it is often the case that significant engineering efforts are required-and one might fairly ask whether such engineering truly falls within the domain of ML research. Taking the view that "big" ML systems can benefit greatly from ML-rooted statistical and algorithmic insights-and that ML researchers should therefore not shy away from such systems design-we discuss a series of principles and strategies distilled from our recent efforts on industrial-scale ML solutions. These principles and strategies span a continuum from application, to engineering, and to theoretical research and development of big ML systems and architectures, with the goal of understanding how to make them efficient, generally applicable, and supported with convergence and scaling guarantees. They concern four key questions that traditionally receive little attention in ML research: How can an ML program be distributed over a cluster? How can ML computation be bridged with inter-machine communication? How can such communication be performed? What should be communicated between machines? By exposing underlying statistical and algorithmic characteristics unique to ML programs but not typically seen in traditional computer programs, and by dissecting successful cases to reveal how we have harnessed these principles to design and develop both high-performance distributed ML software as well as general-purpose ML frameworks, we present opportunities for ML researchers and practitioners to further shape and enlarge the area that lies between ML and systems.. (C) 2016 THE AUTHORS. Published by Elsevier LTD on behalf of Chinese Academy of Engineering and Higher Education Press Limited Company. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:179 / 195
页数:17
相关论文
共 50 条
  • [21] Scalable malware detection system using big data and distributed machine learning approach
    Manish Kumar
    Soft Computing, 2022, 26 : 3987 - 4003
  • [22] Scalable malware detection system using big data and distributed machine learning approach
    Kumar, Manish
    SOFT COMPUTING, 2022, 26 (08) : 3987 - 4003
  • [23] Implementation of Big Imaging Data Pipeline Adhering to FAIR Principles for Federated Machine Learning in Oncology
    Jha, Ashish Kumar
    Mithun, Sneha
    Sherkhane, Umesh B.
    Jaiswar, Vinay
    Shi, Zhenwei
    Kalendralis, Petros
    Kulkarni, Chaitanya
    Dinesh, M. S.
    Rajamenakshi, R.
    Sunder, Gaur
    Purandare, Nilendu
    Wee, Leonard
    Rangarajan, V
    van Soest, Johan
    Dekker, Andre
    IEEE TRANSACTIONS ON RADIATION AND PLASMA MEDICAL SCIENCES, 2022, 6 (02) : 207 - 213
  • [24] A Distributed Intelligent Intrusion Detection System based on Parallel Machine Learning and Big Data Analysis
    Louati, Faten
    Ktata, Farah Barika
    Ben Amor, Ikram Amous
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON SENSOR NETWORKS (SENSORNETS), 2021, : 152 - 157
  • [25] Machine Learning Based Distributed Big Data Analysis Framework for Next Generation Web in IoT
    Singh, Sushil Kumar
    Cha, Jeonghun
    Kim, Tae Woo
    Park, Jong Hyuk
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2021, 18 (02) : 597 - 618
  • [26] Edge-cloud solutions for big data analysis and distributed machine learning-2
    Belcastro, Loris
    Carretero, Jesus
    Talia, Domenico
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 167
  • [27] Edge-Cloud Solutions for Big Data Analysis and Distributed Machine Learning-1
    Belcastro, Loris
    Carretero, Jesus
    Talia, Domenico
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 159 : 323 - 326
  • [28] Distributed parallel deep learning of Hierarchical Extreme Learning Machine for multimode quality prediction with big process data
    Yao, Le
    Ge, Zhiqiang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 81 : 450 - 465
  • [29] Machine learning on big data for future computing
    Jeong, Young-Sik
    Hassan, Houcine
    Sangaiah, Arun Kumar
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (06): : 2925 - 2929
  • [30] Machine Learning for Astronomical Big Data Processing
    Xu, Long
    Yan, Yihua
    2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2017,