Occamy: Memory-efficient GPU Compiler for DNN Inference

被引：5

作者：

Lee, Jaeho ^{[1
]}

Jeong, Shinnung ^{[1
]}

Song, Seungbin ^{[1
]}

Kim, Kunwoo ^{[1
]}

Choi, Heelim ^{[1
]}

Kim, Youngsok ^{[1
]}

Kim, Hanjun ^{[1
]}

机构：

[1] Yonsei Univ, Seoul, South Korea

来源：

2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年

关键词：

D O I：

10.1109/DAC56929.2023.10247839

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy. For each DNN operation, Occamy analyzes the dimensions of input and output tensors, and their liveness within the operation. Across all the operations, Occamy analyzes liveness of all the tensors, generates a memory pool after calculating the maximum required memory size, and schedules when and where to place each tensor in the memory pool. Compared to PyTorch, on an integrated embedded GPU for six DNNs, Occamy reduces the memory usage by 34.6% and achieves a geometric mean speedup of 1.25x.

引用

页数：6

共 50 条

[41] Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA
Li, Gang
Li, Fanrong
Zhao, Tianli
Cheng, Jian
PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 1163 - 1166
[42] Memory-Efficient Polar Decoders
Hashemi, Seyyed Ali
Condo, Carlo
Ercan, Furkan
Gross, Warren J.
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2017, 7 (04) : 604 - 615
[43] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
Jingcheng Shen
Linbo Long
Xin Deng
Masao Okita
Fumihiko Ino
The Journal of Supercomputing, 2023, 79 : 11055 - 11077
[44] GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
Guo, Cong
Zhang, Rui
Xu, Jiale
Leng, Jingwen
Liu, Zihan
Huang, Ziyu
Guo, Minyi
Wu, Hao
Zhao, Shouren
Zhao, Junping
Zhang, Ke
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2024, VOL 2, 2024, : 450 - 466
[45] From GPU to FPGA: A Pipelined Hierarchical Approach to Fast and Memory-efficient NDN Name Lookup
Li, Yanbiao
Zhang, Dafang
Yu, Xian
Long, Jing
Liang, Wei
2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 106 - 106
[46] Block Convolution: Toward Memory-Efficient Inference of Large-Scale CNNs on FPGA
Li, Gang
Liu, Zejian
Li, Fanrong
Cheng, Jian
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (05) : 1436 - 1447
[47] Memory-Efficient Adaptive Optimization
Anil, Rohan
Gupta, Vineet
Koren, Tomer
Singer, Yoram
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[48] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
Shen, Jingcheng
Long, Linbo
Deng, Xin
Okita, Masao
Ino, Fumihiko
JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11055 - 11077
[49] Sparse and Robust RRAM-based Efficient In-memory Computing for DNN Inference
Meng, Jian
Yeo, Injune
Shim, Wonbo
Yang, Li
Fan, Deliang
Yu, Shimeng
Seo, Jae-Sun
2022 IEEE INTERNATIONAL RELIABILITY PHYSICS SYMPOSIUM (IRPS), 2022,
[50] PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference
Zhang, Bo
Yin, Shihui
Kim, Minkyu
Saikia, Jyotishman
Kwon, Soonwan
Myung, Sungmeen
Kim, Hyunsoo
Kim, Sang Joon
Seo, Jae-Sun
Seok, Mingoo
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2023, 58 (05) : 1436 - 1449

← 1 2 3 4 5 →