AdaQAT: Adaptive Bit-Width Quantization-Aware Training

被引：0

作者：

Gernigon, Cedric ^{[1
]}

Filip, Silviu-Ioan ^{[1
]}

Sentieys, Olivier ^{[1
]}

Coggiola, Clement ^{[2
]}

Bruno, Mickael ^{[2
]}

机构：

[1] Univ Rennes, INRIA, CNRS, IRISA, F-35000 Rennes, France

[2] CNES, Spacecraft Tech, Onboard Data Handling, Toulouse, France

来源：

2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024 | 2024年

关键词：

Neural Network Compression; Quantization Aware Training; Adaptive Bit-Width Optimization;

D O I：

10.1109/AICAS59952.2024.10595895

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. In this work, we present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimizes weight and activation signal bit-widths during training for more efficient DNN inference. We use relaxed real-valued bit-widths that are updated using a gradient descent rule, but are otherwise discretized for all quantization operations. The result is a simple and flexible QAT approach for mixed-precision uniform quantization problems. Compared to other methods that are generally designed to be run on a pretrained network, AdaQAT works well in both training from scratch and fine-tuning scenarios. Initial results on the CIFAR-10 and ImageNet datasets using ResNet20 and ResNet18 models, respectively, indicate that our method is competitive with other state-of-the-art mixed-precision quantization approaches.

引用

页码：442 / 446

页数：5

共 50 条

[21] Quantization-aware phase retrieval
Mukherjee, Subhadip
Seelamantula, Chandra Sekhar
INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2022, 20 (03)
[22] Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach
Youn, Jiseok
Song, Jaehun
Kim, Hyung-Sin
Bahk, Saewoong
COMPUTER VISION, ECCV 2022, PT XII, 2022, 13672 : 208 - 224
[23] Quantization-aware training for low precision photonic neural networks
Kirtas, M.
Oikonomou, A.
Passalis, N.
Mourgias-Alexandris, G.
Moralis-Pegios, M.
Pleros, N.
Tefas, A.
NEURAL NETWORKS, 2022, 155 : 561 - 573
[24] Application and Evaluation of Quantization for Narrow Bit-width Resampling of Sequential Monte Carlo
Nishimoto, Hiroki
Zhang, Renyuan
Nakashima, Yasuhiko
2022 IEEE 35TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (IEEE SOCC 2022), 2022, : 261 - 266
[25] Low Precision Quantization-aware Training in Spiking Neural Networks with Differentiable Quantization Function
Shymyrbay, Ayan
Fouda, Mohammed E.
Eltawil, Ahmed
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[26] SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks
Venkatesh, Sreyes
Marinescu, Razvan
Eshraghian, Jason K.
2024 NEURO INSPIRED COMPUTATIONAL ELEMENTS CONFERENCE, NICE, 2024,
[27] FIXAR: A Fixed-Point Deep Reinforcement Learning Platform with Quantization-Aware Training and Adaptive Parallelism
Yang, Je
Hong, Seongmin
Kim, Joo-Young
2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 259 - 264
[28] Quantization-Aware In-situ Training for Reliable and Accurate Edge AI
de Lima, Joao Paulo C.
Carro, Luigi
PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 1497 - 1502
[29] Reducing energy dissipation of frame memory by adaptive bit-width compression
Moshnyaga, VG
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2002, 12 (08) : 713 - 718
[30] Reducing energy dissipation of video memory by adaptive bit-width compression
Moshnyaga, VG
ICES 2002: 9TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS I-111, CONFERENCE PROCEEDINGS, 2002, : 831 - 834

← 1 2 3 4 5 →