Compression of Deep Neural Networks with Structured Sparse Ternary Coding

被引:0
|
作者
Yoonho Boo
Wonyong Sung
机构
[1] Seoul National University,School of Electrical Engineering, Neural Processing Research Center
来源
关键词
Deep neural networks; Weight compression; Structured sparsity; Fixed-point quantization; Pruning;
D O I
暂无
中图分类号
学科分类号
摘要
Deep neural networks (DNNs) contain large number of weights, and usually require many off-chip memory accesses for inference. Weight size compression is a major requirement for on-chip memory based implementation of DNNs, which not only increases inference speed but also reduces power consumption. We propose a weight compression method for deep neural networks by combining pruning and quantization. The proposed method allows weights to have values of + 1 or − 1 only at predetermined positions. Then, a look-up table stores all possible combinations of sub-vectors of weight matrices. Encoding and decoding structured sparse weights can be conducted easily with the table. This method not only allows multiplication-free DNN implementations but also compresses the weight storage by as much as x32 times more than that in floating-point networks and with only a tiny performance loss. Weight distribution normalization and gradual pruning techniques are applied to lower performance degradation. Experiments are conducted with fully connected DNNs and convolutional neural networks.
引用
收藏
页码:1009 / 1019
页数:10
相关论文
共 50 条
  • [41] Sparse low rank factorization for deep neural network compression
    Swaminathan, Sridhar
    Garg, Deepak
    Kannan, Rajkumar
    Andres, Frederic
    NEUROCOMPUTING, 2020, 398 : 185 - 196
  • [42] A survey of model compression for deep neural networks
    Li J.-Y.
    Zhao Y.-K.
    Xue Z.-E.
    Cai Z.
    Li Q.
    Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2019, 41 (10): : 1229 - 1239
  • [43] Convolutional Neural Networks Analyzed via Convolutional Sparse Coding
    Papyan, Vardan
    Romano, Yaniv
    Elad, Michael
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18 : 1 - 52
  • [44] Model Compression for Deep Neural Networks: A Survey
    Li, Zhuo
    Li, Hengyi
    Meng, Lin
    COMPUTERS, 2023, 12 (03)
  • [45] Update Compression for Deep Neural Networks on the Edge
    Chen, Bo
    Bakhshi, Ali
    Batista, Gustavo
    Ng, Brian
    Chin, Tat-Jun
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3075 - 3085
  • [46] Efficient computation via sparse coding in electrosensory neural networks
    Chacron, Maurice J.
    Longtin, Andre
    Maler, Leonard
    CURRENT OPINION IN NEUROBIOLOGY, 2011, 21 (05) : 752 - 760
  • [47] DEEP SPARSE RECTIFIER NEURAL NETWORKS FOR SPEECH DENOISING
    Xu, Lie
    Choy, Chiu-Sing
    Li, Yi-Wen
    2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [48] PHONE RECOGNITION WITH DEEP SPARSE RECTIFIER NEURAL NETWORKS
    Toth, Laszlo
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6985 - 6989
  • [49] Compressing Deep Neural Networks With Sparse Matrix Factorization
    Wu, Kailun
    Guo, Yiwen
    Zhang, Changshui
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (10) : 3828 - 3838
  • [50] Performance of Training Sparse Deep Neural Networks on GPUs
    Wang, Jianzong
    Huang, Zhangcheng
    Kong, Lingwei
    Xiao, Jing
    Wang, Pengyu
    Zhang, Lu
    Li, Chao
    2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,