Occamy: Memory-efficient GPU Compiler for DNN Inference

被引：5

作者：

Lee, Jaeho ^{[1
]}

Jeong, Shinnung ^{[1
]}

Song, Seungbin ^{[1
]}

Kim, Kunwoo ^{[1
]}

Choi, Heelim ^{[1
]}

Kim, Youngsok ^{[1
]}

Kim, Hanjun ^{[1
]}

机构：

[1] Yonsei Univ, Seoul, South Korea

来源：

2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年

关键词：

D O I：

10.1109/DAC56929.2023.10247839

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy. For each DNN operation, Occamy analyzes the dimensions of input and output tensors, and their liveness within the operation. Across all the operations, Occamy analyzes liveness of all the tensors, generates a memory pool after calculating the maximum required memory size, and schedules when and where to place each tensor in the memory pool. Compared to PyTorch, on an integrated embedded GPU for six DNNs, Occamy reduces the memory usage by 34.6% and achieves a geometric mean speedup of 1.25x.

引用

页数：6

共 50 条

[1] Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices
Xie, Xueshuo
Wang, Haoxu
Jian, Zhaolong
Li, Tao
Wang, Wei
Xu, Zhiwei
Wang, Guiling
IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2024, : 2009 - 2018
[2] Balanced Sparsity for Efficient DNN Inference on GPU
Yao, Zhuliang
Cao, Shijie
Xiao, Wencong
Zhang, Chen
Nie, Lanshun
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5676 - 5683
[3] Smart-DNN plus : A Memory-efficient Neural Networks Compression Framework for the Model Inference
Wu, Donglei
Yang, Weihao
Zou, Xiangyu
Xia, Wen
Li, Shiyi
Hu, Zhenbo
Zhang, Weizhe
Fang, Binxing
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2023, 20 (04)
[4] Memory-Efficient Pipeline-Parallel DNN Training
Narayanan, Deepak
Phanishayee, Amar
Shi, Kaiyu
Chen, Xie
Zaharia, Matei
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[5] Activation Sequence Caching: High-Throughput and Memory-Efficient Generative Inference with a Single GPU
Kim, Sowoong
Sim, Eunyeong
Shin, Youngsam
Cho, YeonGon
Baek, Woongki
PROCEEDINGS OF THE 2024 THE INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2024, 2024, : 78 - 90
[6] TinyTS: Memory-Efficient TinyML Model Compiler Framework on Microcontrollers
Liu, Yu-Yuan
Zheng, Hong-Sheng
Hu, Yu-Fang
Hsu, Chen-Fong
Yeh, Tsung Tai
2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 848 - 860
[7] Memory-Efficient Belief Propagation In Stereo Matching on GPU
Choi, Young-kyu
Williem
Park, In Kyu
2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
[8] Memory-Efficient Dataflow Inference for Deep CNNs on FPGA
Petrica, Lucian
Alonso, Tobias
Kroes, Mairin
Fraser, Nicholas
Cotofana, Sorin
Blott, Michaela
2020 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2020), 2020, : 48 - 55
[9] SqueezeNeRF: Further factorized FastNeRF for memory-efficient inference
Wadhwani, Krishna
Kojima, Tamaki
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2716 - 2724
[10] Using Image Morphing for Memory-Efficient Impostor Rendering on GPU
Yuksel, Kamer Ali
Ercil, Aytul
Yucebilgin, Alp
Balcisoy, Selim
2011 INTERNATIONAL CONFERENCE ON CYBERWORLDS, 2011, : 197 - 202

← 1 2 3 4 5 →