An Efficient GCNs Accelerator Using 3D-Stacked Processing-in-Memory Architectures

被引：0

作者：

Wang, Runze ^{[1
,2
,3
]}

Hu, Ao ^{[1
,2
,3
]}

Zheng, Long ^{[1
,2
,3
]}

Wang, Qinggang ^{[1
,2
,3
]}

Yuan, Jingrui ^{[1
,2
,3
]}

Liu, Haifeng ^{[1
,2
,3
]}

Yu, Linchen ^{[4
]}

Liao, Xiaofei ^{[1
,2
]}

Jin, Hai ^{[1
,2
]}

机构：

[1] Huazhong Univ Sci & Technol, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China

[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China

[3] Graph Proc Res Ctr, Zhejiang Lab, Hangzhou 311121, Peoples R China

[4] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan 430074, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2024年 / 43卷 / 05期

关键词：

3D-stacked memory; accelerators; graph convolutional networks (GCNs); processing-in-memory (PIM);

D O I：

10.1109/TCAD.2023.3341753

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph convolutional networks (GCNs) hold great promise in facilitating machine learning on graph-structured data. However, the sparsity of graphs often results in a significant number of irregular memory accesses, leading to inefficient data movement for existing GCNs accelerators. With the advancement of 3D-stacked technology, the processing-in-memory (PIM) architecture has emerged as a promising solution for graph processing. Nevertheless, existing PIM accelerators are confronted with the challenges of irregular remote access in the aggregation phase of GCNs and dynamic workload variations between phases. In this article, we present GCNim, a PIM accelerator based on 3D-stacked memory, which features two key innovations in terms of the computation model and hardware designs. First, we present a PIM-based hybrid computation model, which employs a remote merging strategy to achieve the outer product in aggregation and the row-wise product in combination. Second, GCNim builds a three-stage aggregation and combination pipeline and integrates unified processing elements (PEs) supporting these three stages at the bank level, achieving load balance among PEs through a lightweight data placement algorithm. Compared with the state-of-the-art software frameworks running on CPUs and GPUs, GCNim achieves an average speedup of 3,736.06x and 76.56x , respectively. Moreover, GCNim outperforms the state-of-the-art GCN hardware accelerators, I-GCN, PEDAL, FlowGNN, and GCIM, with average speedups of 3.35x , 8.97x , 2.24x , and 5.58x , respectively.

引用

页码：1360 / 1373

页数：14

共 50 条

[31] A survey of spintronic architectures for processing-in-memory and neural networks
Umesh, Sumanth
Mittal, Sparsh
JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 97 (349-372) : 349 - 372
[32] Massively Parallel Skyline Computation For Processing-In-Memory Architectures
Zois, Vasileios
Gupta, Divya
Tsotras, Vassilis J.
Najjar, Walid A.
Roy, Jean-Francois
27TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2018), 2018,
[33] SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures
Giannoula, Christina
Fernandez, Ivan
Luna, Juan Gomez
Koziris, Nectarios
Goumas, Georgios
Mutlu, Onur
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (01)
[34] Yield Enhancement for 3D-Stacked Memory by Redundancy Sharing across Dies
Jiang, Li
Ye, Rong
Xu, Qiang
2010 IEEE AND ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2010, : 230 - 234
[35] Heat Dissipation Capability of Package with Integrated Processor and 3D-Stacked Memory
Han, Yong
Che, F. X.
Lim, Sharon Seow Huang
Kawano, Masaya
PROCEEDINGS OF THE 2016 IEEE 18TH ELECTRONICS PACKAGING TECHNOLOGY CONFERENCE (EPTC), 2016, : 813 - 817
[36] Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation
Hsieh, Kevin
Khan, Samira
Vijaykumar, Nandita
Chang, Kevin K.
Boroumand, Amirali
Ghose, Saugata
Mutlu, Onur
PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2016, : 25 - 32
[37] Viscoelastic Simulation of Stress and Warpage for Memory Chip 3D-Stacked Package
Wang, Xiyou
Cao, Sicheng
Lu, Guangsheng
Yang, Daoguo
COATINGS, 2022, 12 (12)
[38] NicePIM: Design Space Exploration for Processing-In-Memory DNN Accelerators With 3-D Stacked-DRAM
Wang, Junpeng
Ge, Mengke
Ding, Bo
Xu, Qi
Chen, Song
Kang, Yi
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (05) : 1456 - 1469
[39] DyPIM: Dynamic-Inference-Enabled Processing-In-Memory Accelerator
Xie, Tongxin
Zhao, Tianchen
Zhu, Zhenhua
Ning, Xuefei
Li, Bing
Dai, Guohao
Yang, Huazhong
Wang, Yu
2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
[40] A 3D-Stacked SRAM Using Inductive Coupling Technology for AI Inference Accelerator in 40-nm CMOS
Shiba, Kota
Omori, Tatsuo
Hamada, Mototsugu
Kuroda, Tadahiro
2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 97 - 98

← 1 2 3 4 5 →