Network Support for High-Performance Distributed Machine Learning

被引：6

作者：

Malandrino, Francesco ^{[1
,2
]}

Chiasserini, Carla Fabiana ^{[1
,3
]}

Molner, Nuria ^{[4
,5
]}

de la Oliva, Antonio ^{[6
]}

机构：

[1] CNR, IEIIT, I-10129 Turin, Italy

[2] CNIT, I-43124 Parma, Italy

[3] Politecn Torino, Dept Elect & Telecommun, I-10129 Turin, Italy

[4] Univ Carlos III Madrid, IMDEA Networks Inst, Madrid 28903, Spain

[5] Univ Politecn Valencia iTEAM UPV, Inst Univ Telecomunicac & Aplicac Multimedia, Valencia 46022, Spain

[6] Univ Carlos III Madrid, Dept Telemat Engn, Madrid 28903, Spain

来源：

IEEE-ACM TRANSACTIONS ON NETWORKING | 2023年 / 31卷 / 01期

关键词：

Task analysis; Topology; Network topology; Data models; Costs; Machine learning; Training; Network orchestration; machine learning; edge computing; EDGE;

D O I：

10.1109/TNET.2022.3189077

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The traditional approach to distributed machine learning is to adapt learning algorithms to the network, e.g., reducing updates to curb overhead. Networks based on intelligent edge, instead, make it possible to follow the opposite approach, i.e., to define the logical network topology around the learning task to perform, so as to meet the desired learning performance. In this paper, we propose a system model that captures such aspects in the context of supervised machine learning, accounting for both learning nodes (that perform computations) and information nodes (that provide data). We then formulate the problem of selecting (i) which learning and information nodes should cooperate to complete the learning task, and (ii) the number of epochs to run, in order to minimize the learning cost while meeting the target prediction error and execution time. After proving important properties of the above problem, we devise an algorithm, named DoubleClimb, that can find a 1 + 1/vertical bar I vertical bar-competitive solution (with I being the set of information nodes), with cubic worst-case complexity. Our performance evaluation, leveraging a real-world network topology and considering both classification and regression tasks, also shows that DoubleClimb closely matches the optimum, outperforming state-of-the-art alternatives.

引用

页码：264 / 278

页数：15

共 50 条

[1] A Scalable, High-Performance, and Fault-Tolerant Network Architecture for Distributed Machine Learning
Wang, Songtao
Li, Dan
Cheng, Yang
Geng, Jinkun
Wang, Yanshu
Wang, Shuai
Xia, Shutao
Wu, Jianping
IEEE-ACM TRANSACTIONS ON NETWORKING, 2020, 28 (04) : 1752 - 1764
[2] Litz: Elastic Framework for High-Performance Distributed Machine Learning
Qiao, Aurick
Aghayev, Abutalib
Yu, Weiren
Chen, Haoyang
Ho, Qirong
Gibson, Garth A.
Xing, Eric P.
PROCEEDINGS OF THE 2018 USENIX ANNUAL TECHNICAL CONFERENCE, 2018, : 631 - 643
[3] X-NEST: A Scalable, Flexible, and High-Performance Network Architecture for Distributed Machine Learning
Lu, Yunfeng
Gu, Huaxi
Yu, Xiaoshan
Li, Peng
JOURNAL OF LIGHTWAVE TECHNOLOGY, 2021, 39 (13) : 4247 - 4254
[4] High-Performance Parallel Support Vector Machine Training
Woodsend, Kristian
Gondzio, Jacek
PARALLEL SCIENTIFIC COMPUTING AND OPTIMIZATION: ADVANCES AND APPLICATIONS, 2009, 27 : 83 - 92
[5] A high-performance network infrastructure and protocols for distributed automation
Kume, S
Rizzi, AA
2001 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2001, : 3121 - 3126
[6] High-performance medical data processing technology based on distributed parallel machine learning algorithm
Ji Liu
Xiao Liang
Wenxi Ruan
Bo Zhang
The Journal of Supercomputing, 2022, 78 : 5933 - 5956
[7] A HIGH-PERFORMANCE NETWORK FOR A DISTRIBUTED-CONTROL SYSTEM
CUTTONE, G
AGHION, F
GIOVE, D
NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION B-BEAM INTERACTIONS WITH MATERIALS AND ATOMS, 1989, 40-1 : 978 - 980
[8] High-performance medical data processing technology based on distributed parallel machine learning algorithm
Liu, Ji
Liang, Xiao
Ruan, Wenxi
Zhang, Bo
JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 5933 - 5956
[9] Machine learning toward high-performance electrochemical sensors
Gabriela F. Giordano
Larissa F. Ferreira
Ítalo R. S. Bezerra
Júlia A. Barbosa
Juliana N. Y. Costa
Gabriel J. C. Pimentel
Renato S. Lima
Analytical and Bioanalytical Chemistry, 2023, 415 : 3683 - 3692
[10] Machine learning toward high-performance electrochemical sensors
Giordano, Gabriela F.
Ferreira, Larissa F.
Bezerra, italo R. S.
Barbosa, Julia A.
Costa, Juliana N. Y.
Pimentel, Gabriel J. C.
Lima, Renato S.
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2023, 415 (18) : 3683 - 3692

← 1 2 3 4 5 →