PIMFlow: Compiler and Runtime Support for CNN Models on Processing-in-Memory DRAM

被引：2

作者：

Shin, Yongwon ^{[1
]}

Park, Juseong ^{[2
]}

Cho, Sungjun ^{[2
]}

Sung, Hyojin ^{[1
,2
]}

机构：

[1] POSTECH, Grad Sch AI, Pohang, South Korea

[2] POSTECH, Dept Comp Sci & Engn, Pohang, South Korea

来源：

PROCEEDINGS OF THE 21ST ACM/IEEE INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, CGO 2023 | 2023年

基金：

新加坡国家研究基金会;

关键词：

Processing-in-memory; CNN models;

D O I：

10.1145/3579990.3580009

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Processing-in-Memory (PIM) has evolved over decades into a feasible solution to addressing the exacerbating performance bottleneck with main memory by placing computational logic in or near memory. Recent proposals from DRAM manufacturers highlighted the HW constraint-aware design of PIM-enabled DRAM with specialized MAC logic, providing an order of magnitude speedup for memory-intensive operations in DL models. Although the main target for PIM acceleration did not initially include convolutional neural networks due to their high compute intensity, recent CNN models are increasingly adopting computationally lightweight implementation. Motivated by the potential for the software stack to enable CNN models on DRAM-PIM hardware without invasive changes, we propose PIMFlow, an end-to-end compiler and runtime support, to accelerate CNN models on a PIM-enabled GPU memory. PIMFlow transforms model graphs to create inter-node parallelism across GPU and PIM, explores possible task- and data-parallel execution scenarios for optimal execution time, and provides a code-generating back-end and execution engine for DRAM-PIM. PIMFlow achieves up to 82% end-to-end speedup and reduces energy consumption by 26% on average for CNN model inferences.

引用

页码：249 / 262

页数：14

共 24 条

[21] DualPIM: A Dual-Precision and Low-Power CNN Inference Engine Using SRAM- and eDRAM-based Processing-in-Memory Arrays
Jung, Sangwoo
Lee, Jaehyun
Noh, Huiseong
Yoon, Jong-Hyeok
Kung, Jaeha
2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 70 - 73
[22] An Analysis of The Relationship between A Write Access Reduction Method for NVM/DRAM Hybrid Memory with Programming Language Runtime Support and Execution Policies of Garbage Collection
Nakagawa, Gaku
Oikawa, Shuichi
2014 IIAI 3RD INTERNATIONAL CONFERENCE ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2014), 2014, : 597 - 603
[23] A Ternary Based Bit Scalable, 8.80 TOPS/W CNN accelerator with Many-core Processing-in-memory Architecture with 896K synapses/mm2.
Okumura, Shunsukc
Yabuuchi, Makoto
Hijioka, Kenichiro
Nose, Koichi
2019 SYMPOSIUM ON VLSI CIRCUITS, 2019, : C248 - C249
[24] A Ternary Based Bit Scalable, 8.80 TOPS/W CNN accelerator with Many-core Processing-in-memory Architecture with 896K synapses/mm2
Okumura, Shunsuke
Yabuuchi, Makoto
Hijioka, Kenichiro
Nose, Koichi
2019 SYMPOSIUM ON VLSI TECHNOLOGY, 2019, : C248 - C249

← 1 2 3 →