Occamy: Memory-efficient GPU Compiler for DNN Inference

被引:5
|
作者
Lee, Jaeho [1 ]
Jeong, Shinnung [1 ]
Song, Seungbin [1 ]
Kim, Kunwoo [1 ]
Choi, Heelim [1 ]
Kim, Youngsok [1 ]
Kim, Hanjun [1 ]
机构
[1] Yonsei Univ, Seoul, South Korea
关键词
D O I
10.1109/DAC56929.2023.10247839
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy. For each DNN operation, Occamy analyzes the dimensions of input and output tensors, and their liveness within the operation. Across all the operations, Occamy analyzes liveness of all the tensors, generates a memory pool after calculating the maximum required memory size, and schedules when and where to place each tensor in the memory pool. Compared to PyTorch, on an integrated embedded GPU for six DNNs, Occamy reduces the memory usage by 34.6% and achieves a geometric mean speedup of 1.25x.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices
    Xie, Xueshuo
    Wang, Haoxu
    Jian, Zhaolong
    Li, Tao
    Wang, Wei
    Xu, Zhiwei
    Wang, Guiling
    IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2024, : 2009 - 2018
  • [2] Balanced Sparsity for Efficient DNN Inference on GPU
    Yao, Zhuliang
    Cao, Shijie
    Xiao, Wencong
    Zhang, Chen
    Nie, Lanshun
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5676 - 5683
  • [3] Smart-DNN plus : A Memory-efficient Neural Networks Compression Framework for the Model Inference
    Wu, Donglei
    Yang, Weihao
    Zou, Xiangyu
    Xia, Wen
    Li, Shiyi
    Hu, Zhenbo
    Zhang, Weizhe
    Fang, Binxing
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2023, 20 (04)
  • [4] Memory-Efficient Pipeline-Parallel DNN Training
    Narayanan, Deepak
    Phanishayee, Amar
    Shi, Kaiyu
    Chen, Xie
    Zaharia, Matei
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Activation Sequence Caching: High-Throughput and Memory-Efficient Generative Inference with a Single GPU
    Kim, Sowoong
    Sim, Eunyeong
    Shin, Youngsam
    Cho, YeonGon
    Baek, Woongki
    PROCEEDINGS OF THE 2024 THE INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2024, 2024, : 78 - 90
  • [6] TinyTS: Memory-Efficient TinyML Model Compiler Framework on Microcontrollers
    Liu, Yu-Yuan
    Zheng, Hong-Sheng
    Hu, Yu-Fang
    Hsu, Chen-Fong
    Yeh, Tsung Tai
    2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 848 - 860
  • [7] Memory-Efficient Belief Propagation In Stereo Matching on GPU
    Choi, Young-kyu
    Williem
    Park, In Kyu
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [8] Memory-Efficient Dataflow Inference for Deep CNNs on FPGA
    Petrica, Lucian
    Alonso, Tobias
    Kroes, Mairin
    Fraser, Nicholas
    Cotofana, Sorin
    Blott, Michaela
    2020 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2020), 2020, : 48 - 55
  • [9] SqueezeNeRF: Further factorized FastNeRF for memory-efficient inference
    Wadhwani, Krishna
    Kojima, Tamaki
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2716 - 2724
  • [10] Using Image Morphing for Memory-Efficient Impostor Rendering on GPU
    Yuksel, Kamer Ali
    Ercil, Aytul
    Yucebilgin, Alp
    Balcisoy, Selim
    2011 INTERNATIONAL CONFERENCE ON CYBERWORLDS, 2011, : 197 - 202