An Efficient GCNs Accelerator Using 3D-Stacked Processing-in-Memory Architectures

被引：0

作者：

Wang, Runze ^{[1
,2
,3
]}

Hu, Ao ^{[1
,2
,3
]}

Zheng, Long ^{[1
,2
,3
]}

Wang, Qinggang ^{[1
,2
,3
]}

Yuan, Jingrui ^{[1
,2
,3
]}

Liu, Haifeng ^{[1
,2
,3
]}

Yu, Linchen ^{[4
]}

Liao, Xiaofei ^{[1
,2
]}

Jin, Hai ^{[1
,2
]}

机构：

[1] Huazhong Univ Sci & Technol, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China

[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China

[3] Graph Proc Res Ctr, Zhejiang Lab, Hangzhou 311121, Peoples R China

[4] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan 430074, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2024年 / 43卷 / 05期

关键词：

3D-stacked memory; accelerators; graph convolutional networks (GCNs); processing-in-memory (PIM);

D O I：

10.1109/TCAD.2023.3341753

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph convolutional networks (GCNs) hold great promise in facilitating machine learning on graph-structured data. However, the sparsity of graphs often results in a significant number of irregular memory accesses, leading to inefficient data movement for existing GCNs accelerators. With the advancement of 3D-stacked technology, the processing-in-memory (PIM) architecture has emerged as a promising solution for graph processing. Nevertheless, existing PIM accelerators are confronted with the challenges of irregular remote access in the aggregation phase of GCNs and dynamic workload variations between phases. In this article, we present GCNim, a PIM accelerator based on 3D-stacked memory, which features two key innovations in terms of the computation model and hardware designs. First, we present a PIM-based hybrid computation model, which employs a remote merging strategy to achieve the outer product in aggregation and the row-wise product in combination. Second, GCNim builds a three-stage aggregation and combination pipeline and integrates unified processing elements (PEs) supporting these three stages at the bank level, achieving load balance among PEs through a lightweight data placement algorithm. Compared with the state-of-the-art software frameworks running on CPUs and GPUs, GCNim achieves an average speedup of 3,736.06x and 76.56x , respectively. Moreover, GCNim outperforms the state-of-the-art GCN hardware accelerators, I-GCN, PEDAL, FlowGNN, and GCIM, with average speedups of 3.35x , 8.97x , 2.24x , and 5.58x , respectively.

引用

页码：1360 / 1373

页数：14

共 50 条

[21] Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories
Rezaei, Seyyed Hossein SeyyedAghaei
Moghaddam, Parham Zilouchian
Modarressi, Mehdi
IEEE COMPUTER ARCHITECTURE LETTERS, 2024, 23 (01) : 137 - 141
[22] Thermal characteristics analysis and optimization of 3D-Stacked Memory Packaging
Cao, Fengzhe
Yang, Wen
Yun, Minghui
Yang, Daoguo
2022 19TH CHINA INTERNATIONAL FORUM ON SOLID STATE LIGHTING & 2022 8TH INTERNATIONAL FORUM ON WIDE BANDGAP SEMICONDUCTORS, SSLCHINA: IFWS, 2022, : 121 - 124
[23] Ultra-high bandwidth memory with 3D-stacked emerging memory cells
Abe, Keiko
Tendulkar, Mihir P.
Jameson, John R.
Griffin, Peter B.
Nomura, Kumiko
Fujita, Shinobu
Nishi, Yoshio
2008 IEEE INTERNATIONAL CONFERENCE ON INTEGRATED CIRCUIT DESIGN AND TECHNOLOGY, PROCEEDINGS, 2008, : 203 - +
[24] Investigation of Process -Induced Warpage of 3D-Stacked Memory Packaging
Tao, Yongqi
He, Mingzhuang
Li, Xiaojun
Su, Xiaoxu
Cao, Fengzhe
Yang, Daoguo
2022 23RD INTERNATIONAL CONFERENCE ON ELECTRONIC PACKAGING TECHNOLOGY, ICEPT, 2022,
[25] Carrier-Scale Packet Processing System using Interleaved 3D-Stacked DRAM
Korikawa, Tomohiro
Kawabata, Akio
He, Fujun
Oki, Eiji
2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
[26] A Survey of Resource Management for Processing-In-Memory and Near-Memory Processing Architectures
Khan, Kamil
Pasricha, Sudeep
Kim, Ryan Gary
JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS, 2020, 10 (04) : 1 - 31
[27] Analyzing the Suitability of Contemporary 3D-Stacked PIM Architectures for HPC Scientific Applications
Peng, Ivy B.
Vetter, Jeffrey S.
Moore, Shirley
Joydeep, Rakshit
Markidis, Stefano
CF '19 - PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2019, : 256 - 262
[28] PIMS: A Lightweight Processing-in-Memory Accelerator for Stencil Computations
Li, Jie
Wang, Xi
Tumeo, Antonino
Williams, Brody
Leidel, John D.
Chen, Yong
MEMSYS 2019: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2019, : 41 - 52
[29] Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities
Pattnaik, Ashutosh
Tang, Xulong
Jog, Adwait
Kayiran, Onur
Mishra, Asit K.
Kandemir, Mahmut T.
Mutlu, Onur
Das, Chita R.
2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, : 31 - 44
[30] Sonic Millip3De: A Massively Parallel 3D-Stacked Accelerator for 3D Ultrasound
Sampson, Richard
Yang, Ming
Wei, Siyuan
Chakrabarti, Chaitali
Wenisch, Thomas F.
19TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA2013), 2013, : 318 - 329

← 1 2 3 4 5 →