Performance and Consistency Analysis for Distributed Deep Learning Applications

被引:0
|
作者
Jia, Danlin [1 ]
Saha, Manoj Pravakar [2 ]
Bhimani, Janki [2 ]
Mi, Ningfang [1 ]
机构
[1] Northeastern Univ, Boston, MA 02115 USA
[2] Florida Int Univ, Miami, FL 33199 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/IPCCC50635.2020.9391566
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Accelerating the training of Deep Neural Network (DNN) models is very important for successfully using deep learning techniques in fields like computer vision and speech recognition. Distributed frameworks help to speed up the training process for large DNN models and datasets. Plenty of works have been done to improve model accuracy and training efficiency, based on mathematical analysis of computations in the Convolutional Neural Networks (CNN). However, to run distributed deep learning applications in the real world, users and developers need to consider the impacts of system resource distribution. In this work, we deploy a real distributed deep learning cluster with multiple virtual machines. We conduct an in-depth analysis to understand the impacts of system configurations, distribution typologies, and application parameters, on the latency and correctness of the distributed deep learning applications. We analyze the performance diversity under different model consistency and data parallelism by profiling run-time system utilization and tracking application activities. Based on our observations and analysis, we develop design guidelines for accelerating distributed deep-learning training on virtualized environments.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems
    Yan, Feng
    Ruwase, Olatunji
    He, Yuxiong
    Chilimbi, Trishul
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 1355 - 1364
  • [22] High Performance Distributed Deep Learning: A Beginner's Guide
    Panda, Dhabaleswar K.
    Awan, Ammar Ahmad
    Subramoni, Hari
    PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 452 - 454
  • [23] Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method
    Tsuji, Yohei
    Osawa, Kazuki
    Ueno, Yuichiro
    Naruse, Akira
    Yokota, Rio
    Matsuoka, Satoshi
    PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP 2019), 2019,
  • [24] Causal Consistency for Distributed Data Stores and Applications as They are
    Shudo, Kazuyuki
    Yaguchi, Takashi
    PROCEEDINGS 2016 IEEE 40TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS, VOL 1, 2016, : 602 - 607
  • [25] Building a framework for the consistency management of distributed applications
    Bilicki, Vilmos
    Dombi, Jozsef Daniel
    NET TECHNOLOGIES 2006, FULL PAPERS PROCEEDINGS, 2006, : 55 - 62
  • [26] Understanding Distributed Deep Learning Performance by Correlating HPC and Machine Learning Measurements
    Veroneze Solorzano, Ana Luisa
    Schnorr, Lucas Mello
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2022, 2022, 13289 : 275 - 292
  • [27] A Distributed Cache Mechanism of HDFS to Improve Learning Performance for Deep Reinforcement Learning
    Gao, Yongqiang
    Deng, Shunyi
    Li, Zhenkun
    2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 280 - 285
  • [28] Spark Based Distributed Deep Learning Framework For Big Data Applications
    Khumoyun, Akhmedov
    Cui, Yun
    Hanku, Lee
    2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMMUNICATIONS TECHNOLOGIES (ICISCT), 2016,
  • [29] Deep Metric Learning with Graph Consistency
    Chen, Binghui
    Li, Pengyu
    Yan, Zhaoyi
    Wang, Biao
    Zhang, Lei
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 982 - 990
  • [30] Deep Learning Applications in Medical Image Analysis
    Ker, Justin
    Wang, Lipo
    Rao, Jai
    Lim, Tchoyoson
    IEEE ACCESS, 2018, 6 : 9375 - 9389