4-bit CNN Quantization Method With Compact LUT-Based Multiplier Implementation on FPGA

被引：3

作者：

Zhao, Bingrui ^{[1
]}

Wang, Yaonan ^{[1
]}

Zhang, Hui ^{[1
]}

Zhang, Jinzhou ^{[2
]}

Chen, Yurong ^{[1
]}

Yang, Yimin ^{[3
]}

机构：

[1] Hunan Univ, Natl Engn Res Ctr Robot Visual Percept & Control T, Sch Robot, Changsha 410082, Hunan, Peoples R China

[2] Changsha Univ Sci & Technol, Sch Elect & Informat Engn, Changsha 410114, Hunan, Peoples R China

[3] Western Univ, Dept Elect & Comp Engn, London, ON N6A 3K7, Canada

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2023年 / 72卷

基金：

中国国家自然科学基金;

关键词：

Quantization (signal); Field programmable gate arrays; Convolutional neural networks; Hardware; Convolution; Complexity theory; Power demand; Convolutional neural network (CNN); digital circuit; field-programmable gate array (FPGA); lookup table (LUT)-based multiplier; low-precision quantization; NEURAL-NETWORKS; YOLO CNN; ACCELERATION;

D O I：

10.1109/TIM.2023.3324357

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

To address the challenge of deploying convolutional neural networks (CNNs) on edge devices with limited resources, this article presents an effective 4-bit quantization scheme for CNN and proposes a DSP-free multiplier solution for deploying quantized neural networks on field-programmable gate array (FPGA) devices. Specifically, we first introduce a threshold-aware quantization (TAQ) method with a mixed rounding strategy to compress the scale of the model while maintaining the accuracy of the original full-precision model. Experimental results demonstrate that the proposed quantization method retains a high classification accuracy for 4-bit quantized CNN models. In addition, we propose a compact lookup table-based multiplier (CLM) design that replaces numerical multiplication with a lookup table (LUT) of precomputed 4-bit multiplication results, leveraging LUT6 resources instead of scarce DSP blocks to improve the scalability of FPGA to implement multiplication-intensive CNN algorithms. The proposed 4-bit CLM only consumes 13 LUT6 resources, surpassing the existing LUT-based multipliers (LMULs) in terms of resource consumption. The proposed CNN quantization and CLM multiplier scheme effectively save FPGA resource consumption for FPGA implementation on image classification tasks, providing strong support for deep learning algorithms in unmanned systems, industrial inspection, and other relevant vision and measurement scenarios running on DSP-constrained edge devices.

引用

页码：1 / 10

页数：10

共 36 条

[31] LSMQ: A Layer-Wise Sensitivity-Based MixedPrecision Quantization Method for Bit-Flexible CNN Accelerator
Huang, Yimin
Chen, Kai
Shao, Zhuang
Bai, Yichuan
Huang, Yafeng
Du, Yuan
Du, Li
Wang, Zhongfeng
18TH INTERNATIONAL SOC DESIGN CONFERENCE 2021 (ISOCC 2021), 2021, : 256 - 257
[32] Fast and Light-weight Binarized. Neural Network Implemented in an FPGA using LUT-based Signal Processing and its Time-domain Extension for Multi-bit Processing
Fuchikami, Ryuji
Issiki, Fumio
2019 IEEE 9TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE-BERLIN), 2019, : 120 - 121
[33] Secure and compact implementation of optimized Montgomery multiplier based elliptic curve cryptography on FPGA with road vehicular traffic collecting protocol for VANET application
Baskar, S.
Dhulipala, V. R. Sarma
INTERNATIONAL JOURNAL OF HEAVY VEHICLE SYSTEMS, 2018, 25 (3-4) : 485 - 497
[34] Carry look-ahead and ripple carry method based 4-bit carry generator circuit for implementing wide-word length adder
Khan, Anum
Chakraborty, Arindom
Joy, Upal Barua
Wairya, Subodh
Hasan, Mehedi
MICROELECTRONICS JOURNAL, 2023, 140
[35] Wide word-length carry-select adder design using ripple carry and carry look-ahead method based hybrid 4-bit carry generator
Hasan, Mehedi
Chowdhury, Sujan
Faruqe, Omar
Chakraborty, Arindom
Zaman, Hasan U.
Islam, Sharnali
ENGINEERING REPORTS, 2024, 6 (02)
[36] Design and Implementation of a 16-bit Multi-mode 4-Channel Time-Interleaved Delta-Sigma Modulator with SNDR > 106 dB and DCE Compensation Based on FPGA
Roshanpanah, Abolfazl
Torkzadeh, Pooya
Hajsadeghi, Khosrow
Dousti, Massoud
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, : 2473 - 2502

← 1 2 3 4 →