4-bit CNN Quantization Method With Compact LUT-Based Multiplier Implementation on FPGA

被引:3
|
作者
Zhao, Bingrui [1 ]
Wang, Yaonan [1 ]
Zhang, Hui [1 ]
Zhang, Jinzhou [2 ]
Chen, Yurong [1 ]
Yang, Yimin [3 ]
机构
[1] Hunan Univ, Natl Engn Res Ctr Robot Visual Percept & Control T, Sch Robot, Changsha 410082, Hunan, Peoples R China
[2] Changsha Univ Sci & Technol, Sch Elect & Informat Engn, Changsha 410114, Hunan, Peoples R China
[3] Western Univ, Dept Elect & Comp Engn, London, ON N6A 3K7, Canada
基金
中国国家自然科学基金;
关键词
Quantization (signal); Field programmable gate arrays; Convolutional neural networks; Hardware; Convolution; Complexity theory; Power demand; Convolutional neural network (CNN); digital circuit; field-programmable gate array (FPGA); lookup table (LUT)-based multiplier; low-precision quantization; NEURAL-NETWORKS; YOLO CNN; ACCELERATION;
D O I
10.1109/TIM.2023.3324357
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
To address the challenge of deploying convolutional neural networks (CNNs) on edge devices with limited resources, this article presents an effective 4-bit quantization scheme for CNN and proposes a DSP-free multiplier solution for deploying quantized neural networks on field-programmable gate array (FPGA) devices. Specifically, we first introduce a threshold-aware quantization (TAQ) method with a mixed rounding strategy to compress the scale of the model while maintaining the accuracy of the original full-precision model. Experimental results demonstrate that the proposed quantization method retains a high classification accuracy for 4-bit quantized CNN models. In addition, we propose a compact lookup table-based multiplier (CLM) design that replaces numerical multiplication with a lookup table (LUT) of precomputed 4-bit multiplication results, leveraging LUT6 resources instead of scarce DSP blocks to improve the scalability of FPGA to implement multiplication-intensive CNN algorithms. The proposed 4-bit CLM only consumes 13 LUT6 resources, surpassing the existing LUT-based multipliers (LMULs) in terms of resource consumption. The proposed CNN quantization and CLM multiplier scheme effectively save FPGA resource consumption for FPGA implementation on image classification tasks, providing strong support for deep learning algorithms in unmanned systems, industrial inspection, and other relevant vision and measurement scenarios running on DSP-constrained edge devices.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 36 条
  • [21] Compact 4-bit all optical digital to analog converter based on photonic crystal ring resonators
    Sridarshini, T.
    Indira Gandhi, S.
    Jannath Ui Firthouse, V. N.
    LASER PHYSICS, 2020, 30 (11)
  • [22] LSI implementation of a low-power 4 x 4-bit array two-phase clocked adiabatic static CMOS logic multiplier
    Nayan, Nazrul Anuar
    Takahashi, Yasuhiro
    Sekine, Toshikazu
    MICROELECTRONICS JOURNAL, 2012, 43 (04) : 244 - 249
  • [23] LUTein: Dense-Sparse Bit-slice Architecture with Radix-4 LUT-based Slice-Tensor Processing Units
    Im, Dongseok
    Yoo, Hoi-Jun
    2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 747 - 759
  • [24] An FPGA Implementation of a Quadruple-Based Multiplier for 4D Clifford Algebra
    Franchini, S.
    Gentile, A.
    Sorbello, F.
    Vassallo, G.
    Vitabile, S.
    11TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN - ARCHITECTURES, METHODS AND TOOLS : DSD 2008, PROCEEDINGS, 2008, : 743 - +
  • [25] A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation
    Sui, Xuefu
    Lv, Qunbo
    Bai, Yang
    Zhu, Baoyu
    Zhi, Liangjie
    Yang, Yuanbo
    Tan, Zheng
    SENSORS, 2022, 22 (17)
  • [26] Multiplication-Free Lookup-Based CNN Accelerator Using Residual Vector Quantization and Its FPGA Implementation
    Fuketa, Hiroshi
    Katashita, Toshihiro
    Hori, Yohei
    Hioki, Masakazu
    IEEE ACCESS, 2024, 12 : 102470 - 102480
  • [27] New Polynomial Based Bit -Level Serial CF7,2"1) Multiplier for RS(15,11) 4-bit Codec Optimization
    Mursanto, Petrus
    Nugroho, R. Dimas
    2018 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), 2018, : 107 - 112
  • [28] MBFQuant: A Multiplier-Bitwidth-Fixed, Mixed-Precision Quantization Method for Mobile CNN-Based Applications
    Peng, Peng
    You, Mingyu
    Jiang, Kai
    Lian, Youzao
    Xu, Weisheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2438 - 2453
  • [29] Robust circuit implementation of 4-bit 4-tube CNFET based ALU at 16-nm technology node
    Srivastava, Pragya
    Yadav, Richa
    Srivastava, Richa
    ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING, 2021, 109 (01) : 127 - 134
  • [30] Robust circuit implementation of 4-bit 4-tube CNFET based ALU at 16-nm technology node
    Pragya Srivastava
    Richa Yadav
    Richa Srivastava
    Analog Integrated Circuits and Signal Processing, 2021, 109 : 127 - 134