Occamy: Memory-efficient GPU Compiler for DNN Inference

被引:5
|
作者
Lee, Jaeho [1 ]
Jeong, Shinnung [1 ]
Song, Seungbin [1 ]
Kim, Kunwoo [1 ]
Choi, Heelim [1 ]
Kim, Youngsok [1 ]
Kim, Hanjun [1 ]
机构
[1] Yonsei Univ, Seoul, South Korea
关键词
D O I
10.1109/DAC56929.2023.10247839
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy. For each DNN operation, Occamy analyzes the dimensions of input and output tensors, and their liveness within the operation. Across all the operations, Occamy analyzes liveness of all the tensors, generates a memory pool after calculating the maximum required memory size, and schedules when and where to place each tensor in the memory pool. Compared to PyTorch, on an integrated embedded GPU for six DNNs, Occamy reduces the memory usage by 34.6% and achieves a geometric mean speedup of 1.25x.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Performance Trade-offs in Weight Quantization for Memory-Efficient Inference
    Tostado, Pablo M.
    Pedroni, Bruno U.
    Cauwenberghs, Gert
    2019 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2019), 2019, : 246 - 250
  • [22] Evolutionary Bin Packing for Memory-Efficient Dataflow Inference Acceleration on FPGA
    Kroes, Mairin
    Petrica, Lucian
    Cotofana, Sorin
    Blott, Michaela
    GECCO'20: PROCEEDINGS OF THE 2020 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2020, : 1125 - 1133
  • [23] Coordinated Batching and DVFS for DNN Inference on GPU Accelerators
    Nabavinejad, Seyed Morteza
    Reda, Sherief
    Ebrahimi, Masoumeh
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (10) : 2496 - 2508
  • [24] A Memory-Efficient Edge Inference Accelerator with XOR-based Model Compression
    Lee, Hyunseung
    Hong, Jihoon
    Kim, Soosung
    Lee, Seung Yul
    Lee, Jae W.
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [25] PENETRALIUM: Privacy-preserving and memory-efficient neural network inference at the edge
    Yang, Mengda
    Yi, Wenzhe
    Wang, Juan
    Hu, Hongxin
    Xu, Xiaoyang
    Li, Ziang
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 156 : 30 - 41
  • [26] Buffer Sizes Reduction for Memory-efficient CNN Inference on Mobile and Embedded Devices
    Minakova, Svetlana
    Stefanov, Todor
    2020 23RD EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2020), 2020, : 133 - 140
  • [27] dCSR: A Memory-Efficient Sparse Matrix Representation for Parallel Neural Network Inference
    Trommer, Elias
    Waschneck, Bernd
    Kumar, Akash
    2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD), 2021,
  • [28] PENETRALIUM: Privacy-preserving and memory-efficient neural network inference at the edge
    Yang, Mengda
    Yi, Wenzhe
    Wang, Juan
    Hu, Hongxin
    Xu, Xiaoyang
    Li, Ziang
    Future Generation Computer Systems, 2024, 156 : 30 - 41
  • [29] Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support
    Erbert, Marius
    Rechner, Steffen
    Mueller-Hannemann, Matthias
    ALGORITHMS IN BIOINFORMATICS, 2016, 9838 : 150 - 161
  • [30] Memory-Efficient GPU Volume Path Tracing of AMR Data Using the Dual Mesh
    Zellmann, Stefan
    Wu, Qi
    Ma, Kwan-Liu
    Wald, Ingo
    COMPUTER GRAPHICS FORUM, 2023, 42 (03) : 51 - 62