DeepClone: Lightweight State Replication of Deep Learning Models for Data Parallel Training

被引:2
|
作者
Nicolae, Bogdan [1 ]
Wozniak, Justin M. [1 ]
Dorier, Matthieu [1 ]
Cappello, Franck [1 ]
机构
[1] Argonne Natl Lab, Argonne, IL 60439 USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2020) | 2020年
关键词
deep learning; data-parallel training; layer-wise parallelism; model cloning; state replication;
D O I
10.1109/CLUSTER49012.2020.00033
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Training modern deep neural network (DNN) models involves complex workflows triggered by model exploration, sensitivity analysis, explainability, etc. A key primitive in this context is the ability to clone a model training instance, i.e. "fork" the training process in a potentially different direction, which enables comparisons of different evolution paths using variations of training data and model parameters. However, in a quest improve the training throughput, a mix of data parallel, model parallel, pipeline parallel and layer-wise parallel approaches are making the problem of cloning highly complex. In this paper, we explore the problem of efficient cloning under such circumstances. To this end, we leverage several properties of data-parallel training and layer-wise parallelism to design DeepClone, a cloning approach based on augmenting the execution graph to gain direct access to tensors, which are then sharded and reconstructed asynchronously in order to minimize runtime overhead, standby duration, readiness duration. Compared with state-of-art approaches, DeepClone shows orders of magnitude improvement for several classes of DNN models.
引用
收藏
页码:226 / 236
页数:11
相关论文
共 50 条
  • [1] Compressed Collective Sparse-Sketch for Distributed Data-Parallel Training of Deep Learning Models
    Ge, Keshi
    Lu, Kai
    Fu, Yongquan
    Deng, Xiaoge
    Lai, Zhiquan
    Li, Dongsheng
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2023, 41 (04) : 941 - 963
  • [2] Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks
    Romero, Joshua
    Yin, Junqi
    Laanait, Nouamane
    Xie, Bing
    Young, M. Todd
    Treichler, Sean
    Starchenko, Vitalii
    Borisevich, Albina
    Sergeev, Alex
    Matheson, Michael
    PROCEEDINGS OF THE 19TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION (NSDI '22), 2022, : 1027 - 1040
  • [3] DATA AUGMENTATION IN TRAINING DEEP LEARNING MODELS FOR MALWARE FAMILY CLASSIFICATION
    Ding Yuxin
    Wang Guangbin
    Ma Yubin
    Ding Haoxuan
    PROCEEDINGS OF 2021 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2021, : 102 - 107
  • [4] Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
    Kwon, Woosuk
    Yu, Gyeong-In
    Jeong, Eunji
    Chun, Byung-Gon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] Training deep-learning segmentation models from severely limited data
    Zhao, Yao
    Rhee, Dong Joo
    Cardenas, Carlos
    Court, Laurence E.
    Yang, Jinzhong
    MEDICAL PHYSICS, 2021, 48 (04) : 1697 - 1706
  • [6] Industrial Object Detection: Leveraging Synthetic Data for Training Deep Learning Models
    Ouarab, Sarah
    Boutteau, Remi
    Romeo, Katerine
    Lecomte, Christele
    Laignel, Aristid
    Ragot, Nicolas
    Duval, Fabrice
    INDUSTRIAL ENGINEERING AND APPLICATIONS-EUROPE, ICIEA-EU 2024, 2024, 507 : 200 - 212
  • [7] Training Strategies for Radiology Deep Learning Models in Data-limited Scenarios
    Candemir, Sema
    Nguyen, Xuan, V
    Folio, Les R.
    Prevedello, Luciano M.
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2021, 3 (06)
  • [8] Designing Efficient and Lightweight Deep Learning Models for Healthcare Analysis
    Baltabay, Mereke
    Yazici, Adnan
    Sterling, Mark
    Ever, Enver
    NEURAL PROCESSING LETTERS, 2023, 55 (06) : 6947 - 6977
  • [9] Designing Lightweight Deep Learning Models for Echocardiography View Classification
    Vaseli, Hooman
    Liao, Zhibin
    Abdi, Amir H.
    Girgis, Hany
    Behnami, Delaram
    Luong, Christina
    Dezaki, Fatemeh Taheri
    Dhungel, Neeraj
    Rohling, Robert
    Gin, Ken
    Abolmaesumi, Purang
    Tsang, Teresa
    MEDICAL IMAGING 2019: IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, 2019, 10951
  • [10] Heavy and Lightweight Deep Learning Models for Semantic Segmentation: A Survey
    Carunta, Cristina
    Carunta, Alina
    Popa, Calin-Adrian
    IEEE ACCESS, 2025, 13 : 17745 - 17765