Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks

被引：0

作者：

Chunshan Li

Qing Du

Xiaofei Xu

Jinhui Zhu

Dianhui Chu

机构：

[1] Harbin Institute of Technology,Department of Computer Science and Technology

[2] South China University of Technology,School of Software Engineering

来源：

Mobile Networks and Applications | 2021年 / 26卷

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Deep neural networks have achieved state-of-the-art performances in wide range scenarios, such as natural language processing, object detection, image classification, speech recognition, etc. While showing impressive results across these machine learning tasks, neural network models still suffer from computational consuming and memory intensive for parameters training/storage on mobile service scenario. As a result, how to simplify models as well as accelerate neural networks are undoubtedly to be crucial research topic. To address this issue, in this paper, we propose “Bit-Quantized-Net”(BQ-Net), which can compress deep neural networks both at the training phase and testing inference. And, the model size can be reduced by compressing bit quantized weights. Specifically, for training or testing plain neural network model, it is running tens of millions of times of y=wx+b computations. In BQ-Net, however, model approximate the computation operation y = wx + b by y = sign(w)(x ≫|w|) + b during forward propagation of neural networks. That is, BQ-Net trains the networks with bit quantized weights during forwarding propagation, while retaining the full precision weights for gradients accumulating during backward propagation. Finally, we apply Huffman coding to encode the bit shifting weights which compressed the model size in some way. Extensive experiments on three real data-sets (MNIST, CIFAR-10, SVHN) show that BQ-Net can achieve 10-14× model compressibility.

引用

页码：104 / 113

页数：9

共 50 条

[41] Quantized Magnetic Domain Wall Synapse for Efficient Deep Neural Networks
Dhull, Seema
Al Misba, Walid
Nisar, Arshid
Atulasimha, Jayasimha
Kaushik, Brajesh Kumar
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 10
[42] BinaryRelax: A Relaxation Approach for Training Deep Neural Networks with Quantized Weights
Yin, Penghang
Zhang, Shuai
Lyu, Jiancheng
Osher, Stanley
Qi, Yingyong
Xin, Jack
SIAM JOURNAL ON IMAGING SCIENCES, 2018, 11 (04): : 2205 - 2223
[43] FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference
Ding, Ruizhou
Liu, Zeye
Chin, Ting-Wu
Marculescu, Diana
Blanton, R. D.
PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
[44] A Methodology to Design Quantized Deep Neural Networks for Automatic Modulation Recognition
Goez, David
Soto, Paola
Latre, Steven
Gaviria, Natalia
Camelo, Miguel
ALGORITHMS, 2022, 15 (12)
[45] Human Activity Recognition on Microcontrollers with Quantized and Adaptive Deep Neural Networks
Daghero, Francesco
Burrello, Alessio
Xie, Chen
Castellano, Marco
Gandolfi, Luca
Calimera, Andrea
Macii, Enrico
Poncino, Massimo
Pagliari, Daniele Jahier
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2022, 21 (04)
[46] Quantized Guided Pruning for Efficient Hardware Implementations of Deep Neural Networks
Hacene, Ghouthi Boukli
Gripon, Vincent
Arzel, Matthieu
Farrugia, Nicolas
Bengio, Yoshua
2020 18TH IEEE INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS'20), 2020, : 206 - 209
[47] Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks
Gong, Cheng
Lu, Ye
Xie, Kunpeng
Jin, Zongming
Li, Tao
Wang, Yanzhi
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (11) : 3178 - 3193
[48] Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks
Faraone, Julian
Fraser, Nicholas
Gambardella, Giulio
Blott, Michaela
Leong, Philip H. W.
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 : 393 - 404
[49] OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization
Hu, Peng
Peng, Xi
Zhu, Hongyuan
Aly, Mohamed M. Sabry
Lin, Jie
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7780 - 7788
[50] COMPRESSING DEEP NEURAL NETWORKS USING TOEPLITZ MATRIX: ALGORITHM DESIGN AND FPGA IMPLEMENTATION
Liao, Siyu
Samiee, Ashkan
Deng, Chunhua
Bai, Yu
Yuan, Bo
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1443 - 1447

← 1 2 3 4 5 →