Towards a Flexible and High-Fidelity Approach to Distributed DNN Training Emulation

被引:0
|
作者
Liu, Banruo [1 ,2 ]
Ojewale, Mubarak Adetunji [2 ]
Ding, Yuhan [1 ]
Canini, Marco [2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] KAUST, Thuwal, Saudi Arabia
关键词
Distributed Deep Learning Training; Machine Learning Systems; DNN Training Emulation;
D O I
10.1145/3678015.3680478
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose NeuronaBox, a flexible, user-friendly, and high-fidelity approach to emulate DNN training workloads. We argue that to accurately observe performance, it is possible to execute the training workload on a subset of real nodes and emulate the networked execution environment and the collective communication operations. Initial results from a proof-of-concept implementation show that NeuronaBox replicates the behavior of actual systems with high accuracy, with an error margin of less than 1% between the emulated measurements and the real system.
引用
收藏
页码:88 / 94
页数:7
相关论文
共 50 条
  • [1] Proof-of-Concept of a Flexible and High-Fidelity Approach to Distributed DNN Training Emulation
    Liu, Banruo
    Ojewale, Mubarak Adetunji
    Ding, Yuhan
    Canini, Marco
    PROCEEDINGS OF THE 2024 SIGCOMM WORKSHOP ON NETWORKS FOR AI COMPUTING, NAIC 2024, 2024, : 1 - 3
  • [2] An Approach Towards Distributed DNN Training on FPGA Clusters
    Kreowsky, Philipp
    Knapheide, Justin
    Stabernack, Benno
    ARCHITECTURE OF COMPUTING SYSTEMS, ARCS 2024, 2024, 14842 : 18 - 32
  • [3] A high-fidelity approach towards simulation of pool boiling
    Yazdani, Miad
    Radcliff, Thomas
    Soteriou, Marios
    Alahyari, Abbas A.
    PHYSICS OF FLUIDS, 2016, 28 (01)
  • [4] Towards High Fidelity Network Emulation
    Cao, Lianjie
    Bu, Xiangyu
    Fahmy, Sonia
    Cao, Siyuan
    2017 26TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND NETWORKS (ICCCN 2017), 2017,
  • [5] Towards High-Fidelity Machining Simulation
    Kadir, Aini Abdul
    Xu, Xun
    JOURNAL OF MANUFACTURING SYSTEMS, 2011, 30 (03) : 175 - 186
  • [6] High-Fidelity EW Crew Training
    Nicholas, G.
    Tucker, N.
    Journal of Electronic Defense, 21 (01):
  • [7] High-Fidelity Gradient Inversion in Distributed Learning
    Ye, Zipeng
    Luo, Wenjian
    Zhou, Qi
    Tang, Yubo
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 18, 2024, : 19983 - 19991
  • [8] Comparative performance of high-fidelity training models for flexible ureteroscopy: Are all models effective?
    Mishra, Shashikant
    Sharma, Rajan
    Kumar, Akhilesh
    Ganatra, Pradeep
    Sabnis, Ravindra B.
    Desai, Mahesh R.
    INDIAN JOURNAL OF UROLOGY, 2011, 27 (04) : 451 - 456
  • [9] HIGH-FIDELITY HURRICANE SURGE FORECASTING USING EMULATION AND SEQUENTIAL EXPERIMENTS
    Plumlee, Matthew
    Asher, Taylor G.
    Chang, Won
    Bilskie, Matthew, V
    ANNALS OF APPLIED STATISTICS, 2021, 15 (01): : 460 - 480
  • [10] A Virtual WLAN Device Model for High-Fidelity Wireless Network Emulation
    Kawai, Takaaki
    Kaneda, Shigeru
    Takai, Mineo
    Mineno, Hiroshi
    ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2017, 27 (03):