An Efficient GCNs Accelerator Using 3D-Stacked Processing-in-Memory Architectures

被引:0
|
作者
Wang, Runze [1 ,2 ,3 ]
Hu, Ao [1 ,2 ,3 ]
Zheng, Long [1 ,2 ,3 ]
Wang, Qinggang [1 ,2 ,3 ]
Yuan, Jingrui [1 ,2 ,3 ]
Liu, Haifeng [1 ,2 ,3 ]
Yu, Linchen [4 ]
Liao, Xiaofei [1 ,2 ]
Jin, Hai [1 ,2 ]
机构
[1] Huazhong Univ Sci & Technol, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
[3] Graph Proc Res Ctr, Zhejiang Lab, Hangzhou 311121, Peoples R China
[4] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan 430074, Peoples R China
关键词
3D-stacked memory; accelerators; graph convolutional networks (GCNs); processing-in-memory (PIM);
D O I
10.1109/TCAD.2023.3341753
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph convolutional networks (GCNs) hold great promise in facilitating machine learning on graph-structured data. However, the sparsity of graphs often results in a significant number of irregular memory accesses, leading to inefficient data movement for existing GCNs accelerators. With the advancement of 3D-stacked technology, the processing-in-memory (PIM) architecture has emerged as a promising solution for graph processing. Nevertheless, existing PIM accelerators are confronted with the challenges of irregular remote access in the aggregation phase of GCNs and dynamic workload variations between phases. In this article, we present GCNim, a PIM accelerator based on 3D-stacked memory, which features two key innovations in terms of the computation model and hardware designs. First, we present a PIM-based hybrid computation model, which employs a remote merging strategy to achieve the outer product in aggregation and the row-wise product in combination. Second, GCNim builds a three-stage aggregation and combination pipeline and integrates unified processing elements (PEs) supporting these three stages at the bank level, achieving load balance among PEs through a lightweight data placement algorithm. Compared with the state-of-the-art software frameworks running on CPUs and GPUs, GCNim achieves an average speedup of 3,736.06x and 76.56x , respectively. Moreover, GCNim outperforms the state-of-the-art GCN hardware accelerators, I-GCN, PEDAL, FlowGNN, and GCIM, with average speedups of 3.35x , 8.97x , 2.24x , and 5.58x , respectively.
引用
收藏
页码:1360 / 1373
页数:14
相关论文
共 50 条
  • [21] Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories
    Rezaei, Seyyed Hossein SeyyedAghaei
    Moghaddam, Parham Zilouchian
    Modarressi, Mehdi
    IEEE COMPUTER ARCHITECTURE LETTERS, 2024, 23 (01) : 137 - 141
  • [22] Thermal characteristics analysis and optimization of 3D-Stacked Memory Packaging
    Cao, Fengzhe
    Yang, Wen
    Yun, Minghui
    Yang, Daoguo
    2022 19TH CHINA INTERNATIONAL FORUM ON SOLID STATE LIGHTING & 2022 8TH INTERNATIONAL FORUM ON WIDE BANDGAP SEMICONDUCTORS, SSLCHINA: IFWS, 2022, : 121 - 124
  • [23] Ultra-high bandwidth memory with 3D-stacked emerging memory cells
    Abe, Keiko
    Tendulkar, Mihir P.
    Jameson, John R.
    Griffin, Peter B.
    Nomura, Kumiko
    Fujita, Shinobu
    Nishi, Yoshio
    2008 IEEE INTERNATIONAL CONFERENCE ON INTEGRATED CIRCUIT DESIGN AND TECHNOLOGY, PROCEEDINGS, 2008, : 203 - +
  • [24] Investigation of Process -Induced Warpage of 3D-Stacked Memory Packaging
    Tao, Yongqi
    He, Mingzhuang
    Li, Xiaojun
    Su, Xiaoxu
    Cao, Fengzhe
    Yang, Daoguo
    2022 23RD INTERNATIONAL CONFERENCE ON ELECTRONIC PACKAGING TECHNOLOGY, ICEPT, 2022,
  • [25] Carrier-Scale Packet Processing System using Interleaved 3D-Stacked DRAM
    Korikawa, Tomohiro
    Kawabata, Akio
    He, Fujun
    Oki, Eiji
    2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
  • [26] A Survey of Resource Management for Processing-In-Memory and Near-Memory Processing Architectures
    Khan, Kamil
    Pasricha, Sudeep
    Kim, Ryan Gary
    JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS, 2020, 10 (04) : 1 - 31
  • [27] Analyzing the Suitability of Contemporary 3D-Stacked PIM Architectures for HPC Scientific Applications
    Peng, Ivy B.
    Vetter, Jeffrey S.
    Moore, Shirley
    Joydeep, Rakshit
    Markidis, Stefano
    CF '19 - PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2019, : 256 - 262
  • [28] PIMS: A Lightweight Processing-in-Memory Accelerator for Stencil Computations
    Li, Jie
    Wang, Xi
    Tumeo, Antonino
    Williams, Brody
    Leidel, John D.
    Chen, Yong
    MEMSYS 2019: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2019, : 41 - 52
  • [29] Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities
    Pattnaik, Ashutosh
    Tang, Xulong
    Jog, Adwait
    Kayiran, Onur
    Mishra, Asit K.
    Kandemir, Mahmut T.
    Mutlu, Onur
    Das, Chita R.
    2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, : 31 - 44
  • [30] Sonic Millip3De: A Massively Parallel 3D-Stacked Accelerator for 3D Ultrasound
    Sampson, Richard
    Yang, Ming
    Wei, Siyuan
    Chakrabarti, Chaitali
    Wenisch, Thomas F.
    19TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA2013), 2013, : 318 - 329