dCSR: A Memory-Efficient Sparse Matrix Representation for Parallel Neural Network Inference

被引:4
|
作者
Trommer, Elias [1 ]
Waschneck, Bernd [1 ]
Kumar, Akash [2 ]
机构
[1] Tech Univ Dresden, Infineon Technol, Dresden, Germany
[2] Tech Univ Dresden, Ctr Adv Elect Cfaed, Dresden, Germany
关键词
pruning; sparse neural networks; simd; embedded systems; compression;
D O I
10.1109/ICCAD51958.2021.9643506
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Reducing the memory footprint of neural networks is a crucial prerequisite for deploying them in small and low-cost embedded devices. Network parameters can often be reduced significantly through pruning. We discuss how to best represent the indexing overhead of sparse networks for the coming generation of Single Instruction, Multiple Data (SIMD)-capable microcontrollers. From this, we develop Delta-Compressed Storage Row (dCSR), a storage format for sparse matrices that allows for both low overhead storage and fast inference on embedded systems with wide SIMD units. We demonstrate our method on an ARM Cortex-M55 MCV prototype with M-Profile Vector Extension (MVE). A comparison of memory consumption and throughput shows that our method achieves competitive compression ratios and increases throughput over dense methods by up to 2.9x for sparse matrix-vector multiplication (SpMV)-based kernels and 1.06x for sparse matrix-matrix multiplication (SpMM). This is accomplished through handling the generation of index information directly in the SIMD unit, leading to an increase in effective memory bandwidth.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Pattern-based Sparse Matrix Representation for Memory-Efficient SMVM Kernels
    Belgin, Mehmet
    Back, Godmar
    Ribbens, Calvin J.
    ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2009, : 100 - 109
  • [2] A Memory-Efficient Hybrid Parallel Framework for Deep Neural Network Training
    Li, Dongsheng
    Li, Shengwei
    Lai, Zhiquan
    Fu, Yongquan
    Ye, Xiangyu
    Cai, Lei
    Qiao, Linbo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (04) : 577 - 591
  • [3] PENETRALIUM: Privacy-preserving and memory-efficient neural network inference at the edge
    Yang, Mengda
    Yi, Wenzhe
    Wang, Juan
    Hu, Hongxin
    Xu, Xiaoyang
    Li, Ziang
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 156 : 30 - 41
  • [4] PENETRALIUM: Privacy-preserving and memory-efficient neural network inference at the edge
    Yang, Mengda
    Yi, Wenzhe
    Wang, Juan
    Hu, Hongxin
    Xu, Xiaoyang
    Li, Ziang
    Future Generation Computer Systems, 2024, 156 : 30 - 41
  • [5] MEC Memory-efficient Convolution for Deep Neural Network
    Cho, Minsik
    Brand, Daniel
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [6] MIME: Adapting a Single Neural Network for Multi-task Inference with Memory-efficient Dynamic Pruning
    Bhattacharjee, Abhiroop
    Venkatesha, Yeshwanth
    Moitra, Abhishek
    Panda, Priyadarshini
    PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 499 - 504
  • [7] Memory-efficient Parallel Tensor Decompositions
    Baskaran, Muthu
    Henretty, Tom
    Pradelle, Benoit
    Langston, M. Harper
    Bruns-Smith, David
    Ezick, James
    Lethin, Richard
    2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,
  • [8] Gene Regulatory Network Inference with Evolution Strategies and Sparse Matrix Representation
    Wang, Youchuan
    Mohan, Chilukuri K.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2105 - 2112
  • [9] Memory-Efficient Parallel Computation of Tensor and Matrix Products for Big Tensor Decomposition
    Ravindran, Niranjay
    Sidiropoulos, Nicholas D.
    Smith, Shaden
    Karypis, George
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 581 - 585
  • [10] A MATRIX-FREE APPROACH TO PARALLEL AND MEMORY-EFFICIENT DEFORMABLE IMAGE REGISTRATION
    Koenig, Lars
    Ruehaak, Jan
    Derksen, Alexander
    Lellmann, Jan
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2018, 40 (03): : B858 - B888