Occamy: Memory-efficient GPU Compiler for DNN Inference

被引：5

作者：

Lee, Jaeho ^{[1
]}

Jeong, Shinnung ^{[1
]}

Song, Seungbin ^{[1
]}

Kim, Kunwoo ^{[1
]}

Choi, Heelim ^{[1
]}

Kim, Youngsok ^{[1
]}

Kim, Hanjun ^{[1
]}

机构：

[1] Yonsei Univ, Seoul, South Korea

来源：

2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年

关键词：

D O I：

10.1109/DAC56929.2023.10247839

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy. For each DNN operation, Occamy analyzes the dimensions of input and output tensors, and their liveness within the operation. Across all the operations, Occamy analyzes liveness of all the tensors, generates a memory pool after calculating the maximum required memory size, and schedules when and where to place each tensor in the memory pool. Compared to PyTorch, on an integrated embedded GPU for six DNNs, Occamy reduces the memory usage by 34.6% and achieves a geometric mean speedup of 1.25x.

引用

页数：6

共 50 条

[21] Performance Trade-offs in Weight Quantization for Memory-Efficient Inference
Tostado, Pablo M.
Pedroni, Bruno U.
Cauwenberghs, Gert
2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 246 - 250
[22] Evolutionary Bin Packing for Memory-Efficient Dataflow Inference Acceleration on FPGA
Kroes, Mairin
Petrica, Lucian
Cotofana, Sorin
Blott, Michaela
GECCO'20: PROCEEDINGS OF THE 2020 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2020, : 1125 - 1133
[23] Coordinated Batching and DVFS for DNN Inference on GPU Accelerators
Nabavinejad, Seyed Morteza
Reda, Sherief
Ebrahimi, Masoumeh
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (10) : 2496 - 2508
[24] A Memory-Efficient Edge Inference Accelerator with XOR-based Model Compression
Lee, Hyunseung
Hong, Jihoon
Kim, Soosung
Lee, Seung Yul
Lee, Jae W.
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[25] PENETRALIUM: Privacy-preserving and memory-efficient neural network inference at the edge
Yang, Mengda
Yi, Wenzhe
Wang, Juan
Hu, Hongxin
Xu, Xiaoyang
Li, Ziang
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 156 : 30 - 41
[26] Buffer Sizes Reduction for Memory-efficient CNN Inference on Mobile and Embedded Devices
Minakova, Svetlana
Stefanov, Todor
2020 23RD EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2020), 2020, : 133 - 140
[27] dCSR: A Memory-Efficient Sparse Matrix Representation for Parallel Neural Network Inference
Trommer, Elias
Waschneck, Bernd
Kumar, Akash
2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
[28] PENETRALIUM: Privacy-preserving and memory-efficient neural network inference at the edge
Yang, Mengda
Yi, Wenzhe
Wang, Juan
Hu, Hongxin
Xu, Xiaoyang
Li, Ziang
Future Generation Computer Systems, 2024, 156 : 30 - 41
[29] Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support
Erbert, Marius
Rechner, Steffen
Mueller-Hannemann, Matthias
ALGORITHMS IN BIOINFORMATICS, 2016, 9838 : 150 - 161
[30] Memory-Efficient GPU Volume Path Tracing of AMR Data Using the Dual Mesh
Zellmann, Stefan
Wu, Qi
Ma, Kwan-Liu
Wald, Ingo
COMPUTER GRAPHICS FORUM, 2023, 42 (03) : 51 - 62

← 1 2 3 4 5 →