Network Support for High-Performance Distributed Machine Learning

被引:6
|
作者
Malandrino, Francesco [1 ,2 ]
Chiasserini, Carla Fabiana [1 ,3 ]
Molner, Nuria [4 ,5 ]
de la Oliva, Antonio [6 ]
机构
[1] CNR, IEIIT, I-10129 Turin, Italy
[2] CNIT, I-43124 Parma, Italy
[3] Politecn Torino, Dept Elect & Telecommun, I-10129 Turin, Italy
[4] Univ Carlos III Madrid, IMDEA Networks Inst, Madrid 28903, Spain
[5] Univ Politecn Valencia iTEAM UPV, Inst Univ Telecomunicac & Aplicac Multimedia, Valencia 46022, Spain
[6] Univ Carlos III Madrid, Dept Telemat Engn, Madrid 28903, Spain
关键词
Task analysis; Topology; Network topology; Data models; Costs; Machine learning; Training; Network orchestration; machine learning; edge computing; EDGE;
D O I
10.1109/TNET.2022.3189077
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The traditional approach to distributed machine learning is to adapt learning algorithms to the network, e.g., reducing updates to curb overhead. Networks based on intelligent edge, instead, make it possible to follow the opposite approach, i.e., to define the logical network topology around the learning task to perform, so as to meet the desired learning performance. In this paper, we propose a system model that captures such aspects in the context of supervised machine learning, accounting for both learning nodes (that perform computations) and information nodes (that provide data). We then formulate the problem of selecting (i) which learning and information nodes should cooperate to complete the learning task, and (ii) the number of epochs to run, in order to minimize the learning cost while meeting the target prediction error and execution time. After proving important properties of the above problem, we devise an algorithm, named DoubleClimb, that can find a 1 + 1/vertical bar I vertical bar-competitive solution (with I being the set of information nodes), with cubic worst-case complexity. Our performance evaluation, leveraging a real-world network topology and considering both classification and regression tasks, also shows that DoubleClimb closely matches the optimum, outperforming state-of-the-art alternatives.
引用
收藏
页码:264 / 278
页数:15
相关论文
共 50 条
  • [1] A Scalable, High-Performance, and Fault-Tolerant Network Architecture for Distributed Machine Learning
    Wang, Songtao
    Li, Dan
    Cheng, Yang
    Geng, Jinkun
    Wang, Yanshu
    Wang, Shuai
    Xia, Shutao
    Wu, Jianping
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2020, 28 (04) : 1752 - 1764
  • [2] Litz: Elastic Framework for High-Performance Distributed Machine Learning
    Qiao, Aurick
    Aghayev, Abutalib
    Yu, Weiren
    Chen, Haoyang
    Ho, Qirong
    Gibson, Garth A.
    Xing, Eric P.
    PROCEEDINGS OF THE 2018 USENIX ANNUAL TECHNICAL CONFERENCE, 2018, : 631 - 643
  • [3] X-NEST: A Scalable, Flexible, and High-Performance Network Architecture for Distributed Machine Learning
    Lu, Yunfeng
    Gu, Huaxi
    Yu, Xiaoshan
    Li, Peng
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 2021, 39 (13) : 4247 - 4254
  • [4] High-Performance Parallel Support Vector Machine Training
    Woodsend, Kristian
    Gondzio, Jacek
    PARALLEL SCIENTIFIC COMPUTING AND OPTIMIZATION: ADVANCES AND APPLICATIONS, 2009, 27 : 83 - 92
  • [5] A high-performance network infrastructure and protocols for distributed automation
    Kume, S
    Rizzi, AA
    2001 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2001, : 3121 - 3126
  • [6] High-performance medical data processing technology based on distributed parallel machine learning algorithm
    Ji Liu
    Xiao Liang
    Wenxi Ruan
    Bo Zhang
    The Journal of Supercomputing, 2022, 78 : 5933 - 5956
  • [7] A HIGH-PERFORMANCE NETWORK FOR A DISTRIBUTED-CONTROL SYSTEM
    CUTTONE, G
    AGHION, F
    GIOVE, D
    NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION B-BEAM INTERACTIONS WITH MATERIALS AND ATOMS, 1989, 40-1 : 978 - 980
  • [8] High-performance medical data processing technology based on distributed parallel machine learning algorithm
    Liu, Ji
    Liang, Xiao
    Ruan, Wenxi
    Zhang, Bo
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 5933 - 5956
  • [9] Machine learning toward high-performance electrochemical sensors
    Gabriela F. Giordano
    Larissa F. Ferreira
    Ítalo R. S. Bezerra
    Júlia A. Barbosa
    Juliana N. Y. Costa
    Gabriel J. C. Pimentel
    Renato S. Lima
    Analytical and Bioanalytical Chemistry, 2023, 415 : 3683 - 3692
  • [10] Machine learning toward high-performance electrochemical sensors
    Giordano, Gabriela F.
    Ferreira, Larissa F.
    Bezerra, italo R. S.
    Barbosa, Julia A.
    Costa, Juliana N. Y.
    Pimentel, Gabriel J. C.
    Lima, Renato S.
    ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2023, 415 (18) : 3683 - 3692