Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications

被引:4
|
作者
Wu, Zhenguo [1 ]
Dai, Liang Yuan [1 ]
Novick, Asher [1 ]
Glick, Madeleine [1 ]
Zhu, Ziyi [1 ]
Rumley, Sebastien [2 ]
Michelogiannakis, George [3 ]
Shalf, John [3 ]
Bergman, Keren [4 ]
机构
[1] Columbia Univ, Elect Engn, New York, NY 10027 USA
[2] Univ Appl Sci & Arts Western Switzerland, Elect Engn, CH-2800 Delemont, Switzerland
[3] Lawrence Berkeley Natl Lab, Comp Sci, Berkeley, CA 94720 USA
[4] Columbia Univ, Elect Engn Dept, New York, NY 10027 USA
关键词
Distributed deep learning; collective communication; silicon photonics; optical interconnect; DESIGN;
D O I
10.1109/JLT.2023.3276588
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As Deep Learning (DL) models grow larger and more complex, training jobs are increasingly distributed across multiple Computing Units (CU) such as GPUs and TPUs. Each CU processes a sub-part of the model and synchronizes results with others. Communication among these CUs has emerged as a key bottleneck in the training process. In this work, we present SiPAC, a Silicon Photonic Accelerated Compute cluster. SiPAC accelerates distributed DL training by means of two co-designed components: a photonic physical layer and a novel collective algorithm. The physical layer exploits embedded photonics to bring peta-scale I/O directly to the CUs of a DL optimized cluster and uses resonator-based optical wavelength selectivity to realize hardware multi-casting. The collective algorithm builds on the hardware multi-casting primitive. This combination expedites a variety of collective communications commonly employed in DL training and has the potential to drastically ease the communication bottlenecks. We demonstrate the feasibility of realizing the SiPAC architecture through 1) an optical testbed experiment where an array of comb laser wavelengths are shuffled by a cascaded ring switch, with each ring selecting and forwarding multiple wavelengths to increase the effective communication bandwidth and hence demonstrating the hardware multicasting primitive, and 2) a four-GPU testbed running a realistic DL workload that achieves 22% system-level performance improvement relative to a similarly-sized leaf-spine topology. Large scale simulations show that SiPAC achieves a 1.4x to 5.9x communication time reduction compared to state-of-the-art compute clusters for representative collective communications.
引用
收藏
页码:3737 / 3749
页数:13
相关论文
共 50 条
  • [41] Communication Algorithm-Architecture Co-Design for Distributed Deep Learning
    Huang, Jiayi
    Majumder, Pritam
    Kim, Sungkeun
    Muzahid, Abdullah
    Yum, Ki Hwan
    Kim, Eun Jung
    2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 181 - 194
  • [42] Distributed deep reinforcement learning architecture for task offloading in autonomous IoT systems
    Boni, Abdel Kader Chabi Sika
    Hablatou, Youssef
    Hassan, Hassan
    Drira, Khalil
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON THE INTERNET OF THINGS 2022, IOT 2022, 2022, : 112 - 118
  • [43] Fully distributed actor-critic architecture for multitask deep reinforcement learning
    Valcarcel Macua, Sergio
    Davies, Ian
    Tukiainen, Aleksi
    De Cote, Enrique Munoz
    KNOWLEDGE ENGINEERING REVIEW, 2021, 36
  • [44] An Incremental Iterative Acceleration Architecture in Distributed Heterogeneous Environments With GPUs for Deep Learning
    Zhang, Xuedong
    Tang, Zhuo
    Du, Lifan
    Yang, Li
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (11) : 2823 - 2837
  • [45] ShmCaffe: A Distributed Deep Learning Platform with Shared Memory Buffer for HPC Architecture
    Ahn, Shinyoung
    Kim, Joongheon
    Lim, Eunji
    Choi, Wan
    Mohaisen, Aziz
    Kang, Sungwon
    2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 1118 - 1128
  • [46] Fast and scalable all-optical network architecture for distributed deep learning
    Li, Wenzhe
    Yuan, Guojun
    Wang, Zhan
    Tan, Guangming
    Zhang, Peiheng
    Rouskas, George N.
    JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2024, 16 (03) : 342 - 357
  • [47] Special Section on Deep Learning Technologies: Architecture, Optimization, Techniques, and Applications
    Chen, Abel C. H.
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (05) : 579 - 580
  • [48] Spark Based Distributed Deep Learning Framework For Big Data Applications
    Khumoyun, Akhmedov
    Cui, Yun
    Hanku, Lee
    2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMMUNICATIONS TECHNOLOGIES (ICISCT), 2016,
  • [49] Neuromorphic silicon photonics with 50 GHz tiled matrix multiplication for deep-learning applications
    George Giamougiannis
    Apostolos Tsakyridis
    Miltiadis Moralis-Pegios
    George Mourgias-Alexandris
    Angelina R.Totovic
    George Dabos
    Manos Kirtas
    Nikolaos Passalis
    Anastasios Tefas
    Dimitrios Kalavrouziotis
    Dimitris Syrivelis
    Paraskevas Bakopoulos
    Elad Mentovich
    David Lazovsky
    Nikos Pleros
    Advanced Photonics , 2023, (01) : 55 - 62
  • [50] Neuromorphic silicon photonics with 50 GHz tiled matrix multiplication for deep-learning applications
    Giamougiannis, George
    Tsakyridis, Apostolos
    Moralis-Pegios, Miltiadis
    Mourgias-Alexandris, George
    Totovic, Angelina R.
    Dabos, George
    Kirtas, Manos
    Passalis, Nikolaos
    Tefas, Anastasios
    Kalavrouziotis, Dimitrios
    Syrivelis, Dimitris
    Bakopoulos, Paraskevas
    Mentovich, Elad
    Lazovsky, David
    Pleros, Nikos
    ADVANCED PHOTONICS, 2023, 5 (01):