4-bit CNN Quantization Method With Compact LUT-Based Multiplier Implementation on FPGA

被引:3
|
作者
Zhao, Bingrui [1 ]
Wang, Yaonan [1 ]
Zhang, Hui [1 ]
Zhang, Jinzhou [2 ]
Chen, Yurong [1 ]
Yang, Yimin [3 ]
机构
[1] Hunan Univ, Natl Engn Res Ctr Robot Visual Percept & Control T, Sch Robot, Changsha 410082, Hunan, Peoples R China
[2] Changsha Univ Sci & Technol, Sch Elect & Informat Engn, Changsha 410114, Hunan, Peoples R China
[3] Western Univ, Dept Elect & Comp Engn, London, ON N6A 3K7, Canada
基金
中国国家自然科学基金;
关键词
Quantization (signal); Field programmable gate arrays; Convolutional neural networks; Hardware; Convolution; Complexity theory; Power demand; Convolutional neural network (CNN); digital circuit; field-programmable gate array (FPGA); lookup table (LUT)-based multiplier; low-precision quantization; NEURAL-NETWORKS; YOLO CNN; ACCELERATION;
D O I
10.1109/TIM.2023.3324357
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
To address the challenge of deploying convolutional neural networks (CNNs) on edge devices with limited resources, this article presents an effective 4-bit quantization scheme for CNN and proposes a DSP-free multiplier solution for deploying quantized neural networks on field-programmable gate array (FPGA) devices. Specifically, we first introduce a threshold-aware quantization (TAQ) method with a mixed rounding strategy to compress the scale of the model while maintaining the accuracy of the original full-precision model. Experimental results demonstrate that the proposed quantization method retains a high classification accuracy for 4-bit quantized CNN models. In addition, we propose a compact lookup table-based multiplier (CLM) design that replaces numerical multiplication with a lookup table (LUT) of precomputed 4-bit multiplication results, leveraging LUT6 resources instead of scarce DSP blocks to improve the scalability of FPGA to implement multiplication-intensive CNN algorithms. The proposed 4-bit CLM only consumes 13 LUT6 resources, surpassing the existing LUT-based multipliers (LMULs) in terms of resource consumption. The proposed CNN quantization and CLM multiplier scheme effectively save FPGA resource consumption for FPGA implementation on image classification tasks, providing strong support for deep learning algorithms in unmanned systems, industrial inspection, and other relevant vision and measurement scenarios running on DSP-constrained edge devices.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 36 条
  • [31] LSMQ: A Layer-Wise Sensitivity-Based MixedPrecision Quantization Method for Bit-Flexible CNN Accelerator
    Huang, Yimin
    Chen, Kai
    Shao, Zhuang
    Bai, Yichuan
    Huang, Yafeng
    Du, Yuan
    Du, Li
    Wang, Zhongfeng
    18TH INTERNATIONAL SOC DESIGN CONFERENCE 2021 (ISOCC 2021), 2021, : 256 - 257
  • [32] Fast and Light-weight Binarized. Neural Network Implemented in an FPGA using LUT-based Signal Processing and its Time-domain Extension for Multi-bit Processing
    Fuchikami, Ryuji
    Issiki, Fumio
    2019 IEEE 9TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE-BERLIN), 2019, : 120 - 121
  • [33] Secure and compact implementation of optimized Montgomery multiplier based elliptic curve cryptography on FPGA with road vehicular traffic collecting protocol for VANET application
    Baskar, S.
    Dhulipala, V. R. Sarma
    INTERNATIONAL JOURNAL OF HEAVY VEHICLE SYSTEMS, 2018, 25 (3-4) : 485 - 497
  • [34] Carry look-ahead and ripple carry method based 4-bit carry generator circuit for implementing wide-word length adder
    Khan, Anum
    Chakraborty, Arindom
    Joy, Upal Barua
    Wairya, Subodh
    Hasan, Mehedi
    MICROELECTRONICS JOURNAL, 2023, 140
  • [35] Wide word-length carry-select adder design using ripple carry and carry look-ahead method based hybrid 4-bit carry generator
    Hasan, Mehedi
    Chowdhury, Sujan
    Faruqe, Omar
    Chakraborty, Arindom
    Zaman, Hasan U.
    Islam, Sharnali
    ENGINEERING REPORTS, 2024, 6 (02)
  • [36] Design and Implementation of a 16-bit Multi-mode 4-Channel Time-Interleaved Delta-Sigma Modulator with SNDR > 106 dB and DCE Compensation Based on FPGA
    Roshanpanah, Abolfazl
    Torkzadeh, Pooya
    Hajsadeghi, Khosrow
    Dousti, Massoud
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, : 2473 - 2502