Occamy: Memory-efficient GPU Compiler for DNN Inference

被引:5
|
作者
Lee, Jaeho [1 ]
Jeong, Shinnung [1 ]
Song, Seungbin [1 ]
Kim, Kunwoo [1 ]
Choi, Heelim [1 ]
Kim, Youngsok [1 ]
Kim, Hanjun [1 ]
机构
[1] Yonsei Univ, Seoul, South Korea
关键词
D O I
10.1109/DAC56929.2023.10247839
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work proposes Occamy, a new memory-efficient DNN compiler that reduces the memory usage of a DNN model without affecting its accuracy. For each DNN operation, Occamy analyzes the dimensions of input and output tensors, and their liveness within the operation. Across all the operations, Occamy analyzes liveness of all the tensors, generates a memory pool after calculating the maximum required memory size, and schedules when and where to place each tensor in the memory pool. Compared to PyTorch, on an integrated embedded GPU for six DNNs, Occamy reduces the memory usage by 34.6% and achieves a geometric mean speedup of 1.25x.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA
    Li, Gang
    Li, Fanrong
    Zhao, Tianli
    Cheng, Jian
    PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 1163 - 1166
  • [42] Memory-Efficient Polar Decoders
    Hashemi, Seyyed Ali
    Condo, Carlo
    Ercan, Furkan
    Gross, Warren J.
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2017, 7 (04) : 604 - 615
  • [43] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
    Jingcheng Shen
    Linbo Long
    Xin Deng
    Masao Okita
    Fumihiko Ino
    The Journal of Supercomputing, 2023, 79 : 11055 - 11077
  • [44] GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
    Guo, Cong
    Zhang, Rui
    Xu, Jiale
    Leng, Jingwen
    Liu, Zihan
    Huang, Ziyu
    Guo, Minyi
    Wu, Hao
    Zhao, Shouren
    Zhao, Junping
    Zhang, Ke
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2024, VOL 2, 2024, : 450 - 466
  • [45] From GPU to FPGA: A Pipelined Hierarchical Approach to Fast and Memory-efficient NDN Name Lookup
    Li, Yanbiao
    Zhang, Dafang
    Yu, Xian
    Long, Jing
    Liang, Wei
    2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 106 - 106
  • [46] Block Convolution: Toward Memory-Efficient Inference of Large-Scale CNNs on FPGA
    Li, Gang
    Liu, Zejian
    Li, Fanrong
    Cheng, Jian
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (05) : 1436 - 1447
  • [47] Memory-Efficient Adaptive Optimization
    Anil, Rohan
    Gupta, Vineet
    Koren, Tomer
    Singer, Yoram
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [48] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
    Shen, Jingcheng
    Long, Linbo
    Deng, Xin
    Okita, Masao
    Ino, Fumihiko
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11055 - 11077
  • [49] Sparse and Robust RRAM-based Efficient In-memory Computing for DNN Inference
    Meng, Jian
    Yeo, Injune
    Shim, Wonbo
    Yang, Li
    Fan, Deliang
    Yu, Shimeng
    Seo, Jae-Sun
    2022 IEEE INTERNATIONAL RELIABILITY PHYSICS SYMPOSIUM (IRPS), 2022,
  • [50] PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference
    Zhang, Bo
    Yin, Shihui
    Kim, Minkyu
    Saikia, Jyotishman
    Kwon, Soonwan
    Myung, Sungmeen
    Kim, Hyunsoo
    Kim, Sang Joon
    Seo, Jae-Sun
    Seok, Mingoo
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2023, 58 (05) : 1436 - 1449