Compression of Deep Neural Networks with Structured Sparse Ternary Coding

被引：0

作者：

Yoonho Boo

Wonyong Sung

机构：

[1] Seoul National University,School of Electrical Engineering, Neural Processing Research Center

来源：

Journal of Signal Processing Systems | 2019年 / 91卷

关键词：

Deep neural networks; Weight compression; Structured sparsity; Fixed-point quantization; Pruning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Deep neural networks (DNNs) contain large number of weights, and usually require many off-chip memory accesses for inference. Weight size compression is a major requirement for on-chip memory based implementation of DNNs, which not only increases inference speed but also reduces power consumption. We propose a weight compression method for deep neural networks by combining pruning and quantization. The proposed method allows weights to have values of + 1 or − 1 only at predetermined positions. Then, a look-up table stores all possible combinations of sub-vectors of weight matrices. Encoding and decoding structured sparse weights can be conducted easily with the table. This method not only allows multiplication-free DNN implementations but also compresses the weight storage by as much as x32 times more than that in floating-point networks and with only a tiny performance loss. Weight distribution normalization and gradual pruning techniques are applied to lower performance degradation. Experiments are conducted with fully connected DNNs and convolutional neural networks.

引用

页码：1009 / 1019

页数：10

共 50 条

[41] Sparse low rank factorization for deep neural network compression
Swaminathan, Sridhar
Garg, Deepak
Kannan, Rajkumar
Andres, Frederic
NEUROCOMPUTING, 2020, 398 : 185 - 196
[42] A survey of model compression for deep neural networks
Li J.-Y.
Zhao Y.-K.
Xue Z.-E.
Cai Z.
Li Q.
Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2019, 41 (10): : 1229 - 1239
[43] Convolutional Neural Networks Analyzed via Convolutional Sparse Coding
Papyan, Vardan
Romano, Yaniv
Elad, Michael
JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18 : 1 - 52
[44] Model Compression for Deep Neural Networks: A Survey
Li, Zhuo
Li, Hengyi
Meng, Lin
COMPUTERS, 2023, 12 (03)
[45] Update Compression for Deep Neural Networks on the Edge
Chen, Bo
Bakhshi, Ali
Batista, Gustavo
Ng, Brian
Chin, Tat-Jun
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3075 - 3085
[46] Efficient computation via sparse coding in electrosensory neural networks
Chacron, Maurice J.
Longtin, Andre
Maler, Leonard
CURRENT OPINION IN NEUROBIOLOGY, 2011, 21 (05) : 752 - 760
[47] DEEP SPARSE RECTIFIER NEURAL NETWORKS FOR SPEECH DENOISING
Xu, Lie
Choy, Chiu-Sing
Li, Yi-Wen
2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
[48] PHONE RECOGNITION WITH DEEP SPARSE RECTIFIER NEURAL NETWORKS
Toth, Laszlo
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6985 - 6989
[49] Compressing Deep Neural Networks With Sparse Matrix Factorization
Wu, Kailun
Guo, Yiwen
Zhang, Changshui
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (10) : 3828 - 3838
[50] Performance of Training Sparse Deep Neural Networks on GPUs
Wang, Jianzong
Huang, Zhangcheng
Kong, Lingwei
Xiao, Jing
Wang, Pengyu
Zhang, Lu
Li, Chao
2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,

← 1 2 3 4 5 →