Performance Modeling for Distributed Training of Convolutional Neural Networks

被引:2
|
作者
Castello, Adrian [1 ]
Catalan, Mar [1 ]
Dolz, Manuel F. [1 ]
Mestre, Jose, I [1 ]
Quintana-Orti, Enrique S. [2 ]
Duato, Jose [2 ]
机构
[1] Univ Jaume 1, Castellon de La Plana, Spain
[2] Univ Politecn Valencia, Valencia, Spain
关键词
Deep neural networks (DNNs); distributed training; analytical modeling; clusters; COLLECTIVE COMMUNICATION;
D O I
10.1109/PDP52278.2021.00024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We perform a theoretical analysis comparing the scalability of data versus model parallelism, applied to the distributed training of deep convolutional neural networks (CNNs), along live axes: batch size, node (floating-point) arithmetic performance, node memory bandwidth, network link bandwidth, and cluster dimension. Our study relies on analytical performance models that can he configured to reproduce the components and organization of the CNN model as well as the hardware configuration of the target distributed platform. In addition, we provide evidence of the accuracy of the analytical models by performing a validation against a Python library for distributed deep learning training.
引用
收藏
页码:99 / 108
页数:10
相关论文
共 50 条
  • [21] Convolutional Neural Network Training with Distributed K-FAC
    Pauloski, J. Gregory
    Zhang, Zhao
    Huang, Lei
    Xu, Weijia
    Foster, Ian T.
    PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
  • [22] Training Strategies for Convolutional Neural Networks with Transformed Input
    Khandani, Masoumeh Kalantari
    Mikhael, Wasfy B.
    2021 IEEE INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2021, : 1058 - 1061
  • [23] Efficient Incremental Training for Deep Convolutional Neural Networks
    Tao, Yudong
    Tu, Yuexuan
    Shyu, Mei-Ling
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 286 - 291
  • [24] Efficient Training of Convolutional Neural Nets on Large Distributed Systems
    Sreedhar, Dheeraj
    Saxena, Vaibhav
    Sabharwal, Yogish
    Verma, Ashish
    Kumar, Sameer
    2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 392 - 401
  • [25] Privacy preserving distributed training of neural networks
    Nikolaidis, Spyridon
    Refanidis, Ioannis
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (23): : 17333 - 17350
  • [26] Privacy preserving distributed training of neural networks
    Spyridon Nikolaidis
    Ioannis Refanidis
    Neural Computing and Applications, 2020, 32 : 17333 - 17350
  • [27] A framework for parallel and distributed training of neural networks
    Scardapane, Simone
    Di Lorenzo, Paolo
    NEURAL NETWORKS, 2017, 91 : 42 - 54
  • [28] DeepTracker: Visualizing the Training Process of Convolutional Neural Networks
    Liu, Dongyu
    Cui, Weiwei
    Jin, Kai
    Guo, Yuxiao
    Qu, Huamin
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2019, 10 (01)
  • [29] CONVOLUTIONAL NEURAL NETWORKS AND TRAINING STRATEGIES FOR SKIN DETECTION
    Kim, Yoonsik
    Hwang, Insung
    Cho, Nam Ik
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3919 - 3923
  • [30] Facial Action Units for Training Convolutional Neural Networks
    Trinh Thi Doan Pham
    Won, Chee Sun
    IEEE ACCESS, 2019, 7 : 77816 - 77824