Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks

被引:0
|
作者
Chunshan Li
Qing Du
Xiaofei Xu
Jinhui Zhu
Dianhui Chu
机构
[1] Harbin Institute of Technology,Department of Computer Science and Technology
[2] South China University of Technology,School of Software Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Deep neural networks have achieved state-of-the-art performances in wide range scenarios, such as natural language processing, object detection, image classification, speech recognition, etc. While showing impressive results across these machine learning tasks, neural network models still suffer from computational consuming and memory intensive for parameters training/storage on mobile service scenario. As a result, how to simplify models as well as accelerate neural networks are undoubtedly to be crucial research topic. To address this issue, in this paper, we propose “Bit-Quantized-Net”(BQ-Net), which can compress deep neural networks both at the training phase and testing inference. And, the model size can be reduced by compressing bit quantized weights. Specifically, for training or testing plain neural network model, it is running tens of millions of times of y=wx+b computations. In BQ-Net, however, model approximate the computation operation y = wx + b by y = sign(w)(x ≫|w|) + b during forward propagation of neural networks. That is, BQ-Net trains the networks with bit quantized weights during forwarding propagation, while retaining the full precision weights for gradients accumulating during backward propagation. Finally, we apply Huffman coding to encode the bit shifting weights which compressed the model size in some way. Extensive experiments on three real data-sets (MNIST, CIFAR-10, SVHN) show that BQ-Net can achieve 10-14× model compressibility.
引用
收藏
页码:104 / 113
页数:9
相关论文
共 50 条
  • [41] Quantized Magnetic Domain Wall Synapse for Efficient Deep Neural Networks
    Dhull, Seema
    Al Misba, Walid
    Nisar, Arshid
    Atulasimha, Jayasimha
    Kaushik, Brajesh Kumar
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 10
  • [42] BinaryRelax: A Relaxation Approach for Training Deep Neural Networks with Quantized Weights
    Yin, Penghang
    Zhang, Shuai
    Lyu, Jiancheng
    Osher, Stanley
    Qi, Yingyong
    Xin, Jack
    SIAM JOURNAL ON IMAGING SCIENCES, 2018, 11 (04): : 2205 - 2223
  • [43] FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference
    Ding, Ruizhou
    Liu, Zeye
    Chin, Ting-Wu
    Marculescu, Diana
    Blanton, R. D.
    PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
  • [44] A Methodology to Design Quantized Deep Neural Networks for Automatic Modulation Recognition
    Goez, David
    Soto, Paola
    Latre, Steven
    Gaviria, Natalia
    Camelo, Miguel
    ALGORITHMS, 2022, 15 (12)
  • [45] Human Activity Recognition on Microcontrollers with Quantized and Adaptive Deep Neural Networks
    Daghero, Francesco
    Burrello, Alessio
    Xie, Chen
    Castellano, Marco
    Gandolfi, Luca
    Calimera, Andrea
    Macii, Enrico
    Poncino, Massimo
    Pagliari, Daniele Jahier
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2022, 21 (04)
  • [46] Quantized Guided Pruning for Efficient Hardware Implementations of Deep Neural Networks
    Hacene, Ghouthi Boukli
    Gripon, Vincent
    Arzel, Matthieu
    Farrugia, Nicolas
    Bengio, Yoshua
    2020 18TH IEEE INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS'20), 2020, : 206 - 209
  • [47] Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks
    Gong, Cheng
    Lu, Ye
    Xie, Kunpeng
    Jin, Zongming
    Li, Tao
    Wang, Yanzhi
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 3178 - 3193
  • [48] Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks
    Faraone, Julian
    Fraser, Nicholas
    Gambardella, Giulio
    Blott, Michaela
    Leong, Philip H. W.
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 393 - 404
  • [49] OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization
    Hu, Peng
    Peng, Xi
    Zhu, Hongyuan
    Aly, Mohamed M. Sabry
    Lin, Jie
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7780 - 7788
  • [50] COMPRESSING DEEP NEURAL NETWORKS USING TOEPLITZ MATRIX: ALGORITHM DESIGN AND FPGA IMPLEMENTATION
    Liao, Siyu
    Samiee, Ashkan
    Deng, Chunhua
    Bai, Yu
    Yuan, Bo
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1443 - 1447