dCSR: A Memory-Efficient Sparse Matrix Representation for Parallel Neural Network Inference

被引：4

作者：

Trommer, Elias ^{[1
]}

Waschneck, Bernd ^{[1
]}

Kumar, Akash ^{[2
]}

机构：

[1] Tech Univ Dresden, Infineon Technol, Dresden, Germany

[2] Tech Univ Dresden, Ctr Adv Elect Cfaed, Dresden, Germany

来源：

2021 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN (ICCAD) | 2021年

关键词：

pruning; sparse neural networks; simd; embedded systems; compression;

D O I：

10.1109/ICCAD51958.2021.9643506

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reducing the memory footprint of neural networks is a crucial prerequisite for deploying them in small and low-cost embedded devices. Network parameters can often be reduced significantly through pruning. We discuss how to best represent the indexing overhead of sparse networks for the coming generation of Single Instruction, Multiple Data (SIMD)-capable microcontrollers. From this, we develop Delta-Compressed Storage Row (dCSR), a storage format for sparse matrices that allows for both low overhead storage and fast inference on embedded systems with wide SIMD units. We demonstrate our method on an ARM Cortex-M55 MCV prototype with M-Profile Vector Extension (MVE). A comparison of memory consumption and throughput shows that our method achieves competitive compression ratios and increases throughput over dense methods by up to 2.9x for sparse matrix-vector multiplication (SpMV)-based kernels and 1.06x for sparse matrix-matrix multiplication (SpMM). This is accomplished through handling the generation of index information directly in the SIMD unit, leading to an increase in effective memory bandwidth.

引用

页数：9

共 50 条

[1] Pattern-based Sparse Matrix Representation for Memory-Efficient SMVM Kernels
Belgin, Mehmet
Back, Godmar
Ribbens, Calvin J.
ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2009, : 100 - 109
[2] A Memory-Efficient Hybrid Parallel Framework for Deep Neural Network Training
Li, Dongsheng
Li, Shengwei
Lai, Zhiquan
Fu, Yongquan
Ye, Xiangyu
Cai, Lei
Qiao, Linbo
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (04) : 577 - 591
[3] PENETRALIUM: Privacy-preserving and memory-efficient neural network inference at the edge
Yang, Mengda
Yi, Wenzhe
Wang, Juan
Hu, Hongxin
Xu, Xiaoyang
Li, Ziang
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 156 : 30 - 41
[4] PENETRALIUM: Privacy-preserving and memory-efficient neural network inference at the edge
Yang, Mengda
Yi, Wenzhe
Wang, Juan
Hu, Hongxin
Xu, Xiaoyang
Li, Ziang
Future Generation Computer Systems, 2024, 156 : 30 - 41
[5] MEC Memory-efficient Convolution for Deep Neural Network
Cho, Minsik
Brand, Daniel
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[6] MIME: Adapting a Single Neural Network for Multi-task Inference with Memory-efficient Dynamic Pruning
Bhattacharjee, Abhiroop
Venkatesha, Yeshwanth
Moitra, Abhishek
Panda, Priyadarshini
PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 499 - 504
[7] Memory-efficient Parallel Tensor Decompositions
Baskaran, Muthu
Henretty, Tom
Pradelle, Benoit
Langston, M. Harper
Bruns-Smith, David
Ezick, James
Lethin, Richard
2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2017,
[8] Gene Regulatory Network Inference with Evolution Strategies and Sparse Matrix Representation
Wang, Youchuan
Mohan, Chilukuri K.
2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2105 - 2112
[9] Memory-Efficient Parallel Computation of Tensor and Matrix Products for Big Tensor Decomposition
Ravindran, Niranjay
Sidiropoulos, Nicholas D.
Smith, Shaden
Karypis, George
CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 581 - 585
[10] A MATRIX-FREE APPROACH TO PARALLEL AND MEMORY-EFFICIENT DEFORMABLE IMAGE REGISTRATION
Koenig, Lars
Ruehaak, Jan
Derksen, Alexander
Lellmann, Jan
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2018, 40 (03): : B858 - B888

← 1 2 3 4 5 →