Performance Modeling for Distributed Training of Convolutional Neural Networks

被引：2

作者：

Castello, Adrian ^{[1
]}

Catalan, Mar ^{[1
]}

Dolz, Manuel F. ^{[1
]}

Mestre, Jose, I ^{[1
]}

Quintana-Orti, Enrique S. ^{[2
]}

Duato, Jose ^{[2
]}

机构：

[1] Univ Jaume 1, Castellon de La Plana, Spain

[2] Univ Politecn Valencia, Valencia, Spain

来源：

2021 29TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2021) | 2021年

关键词：

Deep neural networks (DNNs); distributed training; analytical modeling; clusters; COLLECTIVE COMMUNICATION;

D O I：

10.1109/PDP52278.2021.00024

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We perform a theoretical analysis comparing the scalability of data versus model parallelism, applied to the distributed training of deep convolutional neural networks (CNNs), along live axes: batch size, node (floating-point) arithmetic performance, node memory bandwidth, network link bandwidth, and cluster dimension. Our study relies on analytical performance models that can he configured to reproduce the components and organization of the CNN model as well as the hardware configuration of the target distributed platform. In addition, we provide evidence of the accuracy of the analytical models by performing a validation against a Python library for distributed deep learning training.

引用

页码：99 / 108

页数：10

共 50 条

[21] Convolutional Neural Network Training with Distributed K-FAC
Pauloski, J. Gregory
Zhang, Zhao
Huang, Lei
Xu, Weijia
Foster, Ian T.
PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
[22] Training Strategies for Convolutional Neural Networks with Transformed Input
Khandani, Masoumeh Kalantari
Mikhael, Wasfy B.
2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 1058 - 1061
[23] Efficient Incremental Training for Deep Convolutional Neural Networks
Tao, Yudong
Tu, Yuexuan
Shyu, Mei-Ling
2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 286 - 291
[24] Efficient Training of Convolutional Neural Nets on Large Distributed Systems
Sreedhar, Dheeraj
Saxena, Vaibhav
Sabharwal, Yogish
Verma, Ashish
Kumar, Sameer
2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 392 - 401
[25] Privacy preserving distributed training of neural networks
Nikolaidis, Spyridon
Refanidis, Ioannis
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (23): : 17333 - 17350
[26] Privacy preserving distributed training of neural networks
Spyridon Nikolaidis
Ioannis Refanidis
Neural Computing and Applications, 2020, 32 : 17333 - 17350
[27] A framework for parallel and distributed training of neural networks
Scardapane, Simone
Di Lorenzo, Paolo
NEURAL NETWORKS, 2017, 91 : 42 - 54
[28] DeepTracker: Visualizing the Training Process of Convolutional Neural Networks
Liu, Dongyu
Cui, Weiwei
Jin, Kai
Guo, Yuxiao
Qu, Huamin
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2019, 10 (01)
[29] CONVOLUTIONAL NEURAL NETWORKS AND TRAINING STRATEGIES FOR SKIN DETECTION
Kim, Yoonsik
Hwang, Insung
Cho, Nam Ik
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3919 - 3923
[30] Facial Action Units for Training Convolutional Neural Networks
Trinh Thi Doan Pham
Won, Chee Sun
IEEE ACCESS, 2019, 7 : 77816 - 77824

← 1 2 3 4 5 →