Peta-Scale Embedded Photonics Architecture for Distributed Deep Learning Applications

被引：4

作者：

Wu, Zhenguo ^{[1
]}

Dai, Liang Yuan ^{[1
]}

Novick, Asher ^{[1
]}

Glick, Madeleine ^{[1
]}

Zhu, Ziyi ^{[1
]}

Rumley, Sebastien ^{[2
]}

Michelogiannakis, George ^{[3
]}

Shalf, John ^{[3
]}

Bergman, Keren ^{[4
]}

机构：

[1] Columbia Univ, Elect Engn, New York, NY 10027 USA

[2] Univ Appl Sci & Arts Western Switzerland, Elect Engn, CH-2800 Delemont, Switzerland

[3] Lawrence Berkeley Natl Lab, Comp Sci, Berkeley, CA 94720 USA

[4] Columbia Univ, Elect Engn Dept, New York, NY 10027 USA

来源：

JOURNAL OF LIGHTWAVE TECHNOLOGY | 2023年 / 41卷 / 12期

关键词：

Distributed deep learning; collective communication; silicon photonics; optical interconnect; DESIGN;

D O I：

10.1109/JLT.2023.3276588

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

As Deep Learning (DL) models grow larger and more complex, training jobs are increasingly distributed across multiple Computing Units (CU) such as GPUs and TPUs. Each CU processes a sub-part of the model and synchronizes results with others. Communication among these CUs has emerged as a key bottleneck in the training process. In this work, we present SiPAC, a Silicon Photonic Accelerated Compute cluster. SiPAC accelerates distributed DL training by means of two co-designed components: a photonic physical layer and a novel collective algorithm. The physical layer exploits embedded photonics to bring peta-scale I/O directly to the CUs of a DL optimized cluster and uses resonator-based optical wavelength selectivity to realize hardware multi-casting. The collective algorithm builds on the hardware multi-casting primitive. This combination expedites a variety of collective communications commonly employed in DL training and has the potential to drastically ease the communication bottlenecks. We demonstrate the feasibility of realizing the SiPAC architecture through 1) an optical testbed experiment where an array of comb laser wavelengths are shuffled by a cascaded ring switch, with each ring selecting and forwarding multiple wavelengths to increase the effective communication bandwidth and hence demonstrating the hardware multicasting primitive, and 2) a four-GPU testbed running a realistic DL workload that achieves 22% system-level performance improvement relative to a similarly-sized leaf-spine topology. Large scale simulations show that SiPAC achieves a 1.4x to 5.9x communication time reduction compared to state-of-the-art compute clusters for representative collective communications.

引用

页码：3737 / 3749

页数：13

共 50 条

[41] Communication Algorithm-Architecture Co-Design for Distributed Deep Learning
Huang, Jiayi
Majumder, Pritam
Kim, Sungkeun
Muzahid, Abdullah
Yum, Ki Hwan
Kim, Eun Jung
2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 181 - 194
[42] Distributed deep reinforcement learning architecture for task offloading in autonomous IoT systems
Boni, Abdel Kader Chabi Sika
Hablatou, Youssef
Hassan, Hassan
Drira, Khalil
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON THE INTERNET OF THINGS 2022, IOT 2022, 2022, : 112 - 118
[43] Fully distributed actor-critic architecture for multitask deep reinforcement learning
Valcarcel Macua, Sergio
Davies, Ian
Tukiainen, Aleksi
De Cote, Enrique Munoz
KNOWLEDGE ENGINEERING REVIEW, 2021, 36
[44] An Incremental Iterative Acceleration Architecture in Distributed Heterogeneous Environments With GPUs for Deep Learning
Zhang, Xuedong
Tang, Zhuo
Du, Lifan
Yang, Li
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (11) : 2823 - 2837
[45] ShmCaffe: A Distributed Deep Learning Platform with Shared Memory Buffer for HPC Architecture
Ahn, Shinyoung
Kim, Joongheon
Lim, Eunji
Choi, Wan
Mohaisen, Aziz
Kang, Sungwon
2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 1118 - 1128
[46] Fast and scalable all-optical network architecture for distributed deep learning
Li, Wenzhe
Yuan, Guojun
Wang, Zhan
Tan, Guangming
Zhang, Peiheng
Rouskas, George N.
JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING, 2024, 16 (03) : 342 - 357
[47] Special Section on Deep Learning Technologies: Architecture, Optimization, Techniques, and Applications
Chen, Abel C. H.
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (05) : 579 - 580
[48] Spark Based Distributed Deep Learning Framework For Big Data Applications
Khumoyun, Akhmedov
Cui, Yun
Hanku, Lee
2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMMUNICATIONS TECHNOLOGIES (ICISCT), 2016,
[49] Neuromorphic silicon photonics with 50 GHz tiled matrix multiplication for deep-learning applications
George Giamougiannis
Apostolos Tsakyridis
Miltiadis Moralis-Pegios
George Mourgias-Alexandris
Angelina R.Totovic
George Dabos
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
Dimitrios Kalavrouziotis
Dimitris Syrivelis
Paraskevas Bakopoulos
Elad Mentovich
David Lazovsky
Nikos Pleros
Advanced Photonics , 2023, (01) : 55 - 62
[50] Neuromorphic silicon photonics with 50 GHz tiled matrix multiplication for deep-learning applications
Giamougiannis, George
Tsakyridis, Apostolos
Moralis-Pegios, Miltiadis
Mourgias-Alexandris, George
Totovic, Angelina R.
Dabos, George
Kirtas, Manos
Passalis, Nikolaos
Tefas, Anastasios
Kalavrouziotis, Dimitrios
Syrivelis, Dimitris
Bakopoulos, Paraskevas
Mentovich, Elad
Lazovsky, David
Pleros, Nikos
ADVANCED PHOTONICS, 2023, 5 (01):

← 1 2 3 4 5 →