4-bit CNN Quantization Method With Compact LUT-Based Multiplier Implementation on FPGA

被引：3

作者：

Zhao, Bingrui ^{[1
]}

Wang, Yaonan ^{[1
]}

Zhang, Hui ^{[1
]}

Zhang, Jinzhou ^{[2
]}

Chen, Yurong ^{[1
]}

Yang, Yimin ^{[3
]}

机构：

[1] Hunan Univ, Natl Engn Res Ctr Robot Visual Percept & Control T, Sch Robot, Changsha 410082, Hunan, Peoples R China

[2] Changsha Univ Sci & Technol, Sch Elect & Informat Engn, Changsha 410114, Hunan, Peoples R China

[3] Western Univ, Dept Elect & Comp Engn, London, ON N6A 3K7, Canada

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2023年 / 72卷

基金：

中国国家自然科学基金;

关键词：

Quantization (signal); Field programmable gate arrays; Convolutional neural networks; Hardware; Convolution; Complexity theory; Power demand; Convolutional neural network (CNN); digital circuit; field-programmable gate array (FPGA); lookup table (LUT)-based multiplier; low-precision quantization; NEURAL-NETWORKS; YOLO CNN; ACCELERATION;

D O I：

10.1109/TIM.2023.3324357

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

To address the challenge of deploying convolutional neural networks (CNNs) on edge devices with limited resources, this article presents an effective 4-bit quantization scheme for CNN and proposes a DSP-free multiplier solution for deploying quantized neural networks on field-programmable gate array (FPGA) devices. Specifically, we first introduce a threshold-aware quantization (TAQ) method with a mixed rounding strategy to compress the scale of the model while maintaining the accuracy of the original full-precision model. Experimental results demonstrate that the proposed quantization method retains a high classification accuracy for 4-bit quantized CNN models. In addition, we propose a compact lookup table-based multiplier (CLM) design that replaces numerical multiplication with a lookup table (LUT) of precomputed 4-bit multiplication results, leveraging LUT6 resources instead of scarce DSP blocks to improve the scalability of FPGA to implement multiplication-intensive CNN algorithms. The proposed 4-bit CLM only consumes 13 LUT6 resources, surpassing the existing LUT-based multipliers (LMULs) in terms of resource consumption. The proposed CNN quantization and CLM multiplier scheme effectively save FPGA resource consumption for FPGA implementation on image classification tasks, providing strong support for deep learning algorithms in unmanned systems, industrial inspection, and other relevant vision and measurement scenarios running on DSP-constrained edge devices.

引用

页码：1 / 10

页数：10

共 36 条

[21] Compact 4-bit all optical digital to analog converter based on photonic crystal ring resonators
Sridarshini, T.
Indira Gandhi, S.
Jannath Ui Firthouse, V. N.
LASER PHYSICS, 2020, 30 (11)
[22] LSI implementation of a low-power 4 x 4-bit array two-phase clocked adiabatic static CMOS logic multiplier
Nayan, Nazrul Anuar
Takahashi, Yasuhiro
Sekine, Toshikazu
MICROELECTRONICS JOURNAL, 2012, 43 (04) : 244 - 249
[23] LUTein: Dense-Sparse Bit-slice Architecture with Radix-4 LUT-based Slice-Tensor Processing Units
Im, Dongseok
Yoo, Hoi-Jun
2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 747 - 759
[24] An FPGA Implementation of a Quadruple-Based Multiplier for 4D Clifford Algebra
Franchini, S.
Gentile, A.
Sorbello, F.
Vassallo, G.
Vitabile, S.
11TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN - ARCHITECTURES, METHODS AND TOOLS : DSD 2008, PROCEEDINGS, 2008, : 743 - +
[25] A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation
Sui, Xuefu
Lv, Qunbo
Bai, Yang
Zhu, Baoyu
Zhi, Liangjie
Yang, Yuanbo
Tan, Zheng
SENSORS, 2022, 22 (17)
[26] Multiplication-Free Lookup-Based CNN Accelerator Using Residual Vector Quantization and Its FPGA Implementation
Fuketa, Hiroshi
Katashita, Toshihiro
Hori, Yohei
Hioki, Masakazu
IEEE ACCESS, 2024, 12 : 102470 - 102480
[27] New Polynomial Based Bit -Level Serial CF7,2"1) Multiplier for RS(15,11) 4-bit Codec Optimization
Mursanto, Petrus
Nugroho, R. Dimas
2018 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), 2018, : 107 - 112
[28] MBFQuant: A Multiplier-Bitwidth-Fixed, Mixed-Precision Quantization Method for Mobile CNN-Based Applications
Peng, Peng
You, Mingyu
Jiang, Kai
Lian, Youzao
Xu, Weisheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2438 - 2453
[29] Robust circuit implementation of 4-bit 4-tube CNFET based ALU at 16-nm technology node
Srivastava, Pragya
Yadav, Richa
Srivastava, Richa
ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING, 2021, 109 (01) : 127 - 134
[30] Robust circuit implementation of 4-bit 4-tube CNFET based ALU at 16-nm technology node
Pragya Srivastava
Richa Yadav
Richa Srivastava
Analog Integrated Circuits and Signal Processing, 2021, 109 : 127 - 134

← 1 2 3 4 →