An Efficient GCNs Accelerator Using 3D-Stacked Processing-in-Memory Architectures

被引:0
|
作者
Wang, Runze [1 ,2 ,3 ]
Hu, Ao [1 ,2 ,3 ]
Zheng, Long [1 ,2 ,3 ]
Wang, Qinggang [1 ,2 ,3 ]
Yuan, Jingrui [1 ,2 ,3 ]
Liu, Haifeng [1 ,2 ,3 ]
Yu, Linchen [4 ]
Liao, Xiaofei [1 ,2 ]
Jin, Hai [1 ,2 ]
机构
[1] Huazhong Univ Sci & Technol, Serv Comp Technol & Syst Lab, Natl Engn Res Ctr Big Data Technol & Syst, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
[3] Graph Proc Res Ctr, Zhejiang Lab, Hangzhou 311121, Peoples R China
[4] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan 430074, Peoples R China
关键词
3D-stacked memory; accelerators; graph convolutional networks (GCNs); processing-in-memory (PIM);
D O I
10.1109/TCAD.2023.3341753
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Graph convolutional networks (GCNs) hold great promise in facilitating machine learning on graph-structured data. However, the sparsity of graphs often results in a significant number of irregular memory accesses, leading to inefficient data movement for existing GCNs accelerators. With the advancement of 3D-stacked technology, the processing-in-memory (PIM) architecture has emerged as a promising solution for graph processing. Nevertheless, existing PIM accelerators are confronted with the challenges of irregular remote access in the aggregation phase of GCNs and dynamic workload variations between phases. In this article, we present GCNim, a PIM accelerator based on 3D-stacked memory, which features two key innovations in terms of the computation model and hardware designs. First, we present a PIM-based hybrid computation model, which employs a remote merging strategy to achieve the outer product in aggregation and the row-wise product in combination. Second, GCNim builds a three-stage aggregation and combination pipeline and integrates unified processing elements (PEs) supporting these three stages at the bank level, achieving load balance among PEs through a lightweight data placement algorithm. Compared with the state-of-the-art software frameworks running on CPUs and GPUs, GCNim achieves an average speedup of 3,736.06x and 76.56x , respectively. Moreover, GCNim outperforms the state-of-the-art GCN hardware accelerators, I-GCN, PEDAL, FlowGNN, and GCIM, with average speedups of 3.35x , 8.97x , 2.24x , and 5.58x , respectively.
引用
收藏
页码:1360 / 1373
页数:14
相关论文
共 50 条
  • [31] A survey of spintronic architectures for processing-in-memory and neural networks
    Umesh, Sumanth
    Mittal, Sparsh
    JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 97 (349-372) : 349 - 372
  • [32] Massively Parallel Skyline Computation For Processing-In-Memory Architectures
    Zois, Vasileios
    Gupta, Divya
    Tsotras, Vassilis J.
    Najjar, Walid A.
    Roy, Jean-Francois
    27TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2018), 2018,
  • [33] SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures
    Giannoula, Christina
    Fernandez, Ivan
    Luna, Juan Gomez
    Koziris, Nectarios
    Goumas, Georgios
    Mutlu, Onur
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2022, 6 (01)
  • [34] Yield Enhancement for 3D-Stacked Memory by Redundancy Sharing across Dies
    Jiang, Li
    Ye, Rong
    Xu, Qiang
    2010 IEEE AND ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2010, : 230 - 234
  • [35] Heat Dissipation Capability of Package with Integrated Processor and 3D-Stacked Memory
    Han, Yong
    Che, F. X.
    Lim, Sharon Seow Huang
    Kawano, Masaya
    PROCEEDINGS OF THE 2016 IEEE 18TH ELECTRONICS PACKAGING TECHNOLOGY CONFERENCE (EPTC), 2016, : 813 - 817
  • [36] Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation
    Hsieh, Kevin
    Khan, Samira
    Vijaykumar, Nandita
    Chang, Kevin K.
    Boroumand, Amirali
    Ghose, Saugata
    Mutlu, Onur
    PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2016, : 25 - 32
  • [37] Viscoelastic Simulation of Stress and Warpage for Memory Chip 3D-Stacked Package
    Wang, Xiyou
    Cao, Sicheng
    Lu, Guangsheng
    Yang, Daoguo
    COATINGS, 2022, 12 (12)
  • [38] NicePIM: Design Space Exploration for Processing-In-Memory DNN Accelerators With 3-D Stacked-DRAM
    Wang, Junpeng
    Ge, Mengke
    Ding, Bo
    Xu, Qi
    Chen, Song
    Kang, Yi
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (05) : 1456 - 1469
  • [39] DyPIM: Dynamic-Inference-Enabled Processing-In-Memory Accelerator
    Xie, Tongxin
    Zhao, Tianchen
    Zhu, Zhenhua
    Ning, Xuefei
    Li, Bing
    Dai, Guohao
    Yang, Huazhong
    Wang, Yu
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [40] A 3D-Stacked SRAM Using Inductive Coupling Technology for AI Inference Accelerator in 40-nm CMOS
    Shiba, Kota
    Omori, Tatsuo
    Hamada, Mototsugu
    Kuroda, Tadahiro
    2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 97 - 98