PIMFlow: Compiler and Runtime Support for CNN Models on Processing-in-Memory DRAM

被引:2
|
作者
Shin, Yongwon [1 ]
Park, Juseong [2 ]
Cho, Sungjun [2 ]
Sung, Hyojin [1 ,2 ]
机构
[1] POSTECH, Grad Sch AI, Pohang, South Korea
[2] POSTECH, Dept Comp Sci & Engn, Pohang, South Korea
基金
新加坡国家研究基金会;
关键词
Processing-in-memory; CNN models;
D O I
10.1145/3579990.3580009
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Processing-in-Memory (PIM) has evolved over decades into a feasible solution to addressing the exacerbating performance bottleneck with main memory by placing computational logic in or near memory. Recent proposals from DRAM manufacturers highlighted the HW constraint-aware design of PIM-enabled DRAM with specialized MAC logic, providing an order of magnitude speedup for memory-intensive operations in DL models. Although the main target for PIM acceleration did not initially include convolutional neural networks due to their high compute intensity, recent CNN models are increasingly adopting computationally lightweight implementation. Motivated by the potential for the software stack to enable CNN models on DRAM-PIM hardware without invasive changes, we propose PIMFlow, an end-to-end compiler and runtime support, to accelerate CNN models on a PIM-enabled GPU memory. PIMFlow transforms model graphs to create inter-node parallelism across GPU and PIM, explores possible task- and data-parallel execution scenarios for optimal execution time, and provides a code-generating back-end and execution engine for DRAM-PIM. PIMFlow achieves up to 82% end-to-end speedup and reduces energy consumption by 26% on average for CNN model inferences.
引用
收藏
页码:249 / 262
页数:14
相关论文
共 24 条
  • [21] DualPIM: A Dual-Precision and Low-Power CNN Inference Engine Using SRAM- and eDRAM-based Processing-in-Memory Arrays
    Jung, Sangwoo
    Lee, Jaehyun
    Noh, Huiseong
    Yoon, Jong-Hyeok
    Kung, Jaeha
    2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 70 - 73
  • [22] An Analysis of The Relationship between A Write Access Reduction Method for NVM/DRAM Hybrid Memory with Programming Language Runtime Support and Execution Policies of Garbage Collection
    Nakagawa, Gaku
    Oikawa, Shuichi
    2014 IIAI 3RD INTERNATIONAL CONFERENCE ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2014), 2014, : 597 - 603
  • [23] A Ternary Based Bit Scalable, 8.80 TOPS/W CNN accelerator with Many-core Processing-in-memory Architecture with 896K synapses/mm2.
    Okumura, Shunsukc
    Yabuuchi, Makoto
    Hijioka, Kenichiro
    Nose, Koichi
    2019 SYMPOSIUM ON VLSI CIRCUITS, 2019, : C248 - C249
  • [24] A Ternary Based Bit Scalable, 8.80 TOPS/W CNN accelerator with Many-core Processing-in-memory Architecture with 896K synapses/mm2
    Okumura, Shunsuke
    Yabuuchi, Makoto
    Hijioka, Kenichiro
    Nose, Koichi
    2019 SYMPOSIUM ON VLSI TECHNOLOGY, 2019, : C248 - C249