AdaQAT: Adaptive Bit-Width Quantization-Aware Training

被引:0
|
作者
Gernigon, Cedric [1 ]
Filip, Silviu-Ioan [1 ]
Sentieys, Olivier [1 ]
Coggiola, Clement [2 ]
Bruno, Mickael [2 ]
机构
[1] Univ Rennes, INRIA, CNRS, IRISA, F-35000 Rennes, France
[2] CNES, Spacecraft Tech, Onboard Data Handling, Toulouse, France
关键词
Neural Network Compression; Quantization Aware Training; Adaptive Bit-Width Optimization;
D O I
10.1109/AICAS59952.2024.10595895
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. In this work, we present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimizes weight and activation signal bit-widths during training for more efficient DNN inference. We use relaxed real-valued bit-widths that are updated using a gradient descent rule, but are otherwise discretized for all quantization operations. The result is a simple and flexible QAT approach for mixed-precision uniform quantization problems. Compared to other methods that are generally designed to be run on a pretrained network, AdaQAT works well in both training from scratch and fine-tuning scenarios. Initial results on the CIFAR-10 and ImageNet datasets using ResNet20 and ResNet18 models, respectively, indicate that our method is competitive with other state-of-the-art mixed-precision quantization approaches.
引用
收藏
页码:442 / 446
页数:5
相关论文
共 50 条
  • [21] Quantization-aware phase retrieval
    Mukherjee, Subhadip
    Seelamantula, Chandra Sekhar
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2022, 20 (03)
  • [22] Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach
    Youn, Jiseok
    Song, Jaehun
    Kim, Hyung-Sin
    Bahk, Saewoong
    COMPUTER VISION, ECCV 2022, PT XII, 2022, 13672 : 208 - 224
  • [23] Quantization-aware training for low precision photonic neural networks
    Kirtas, M.
    Oikonomou, A.
    Passalis, N.
    Mourgias-Alexandris, G.
    Moralis-Pegios, M.
    Pleros, N.
    Tefas, A.
    NEURAL NETWORKS, 2022, 155 : 561 - 573
  • [24] Application and Evaluation of Quantization for Narrow Bit-width Resampling of Sequential Monte Carlo
    Nishimoto, Hiroki
    Zhang, Renyuan
    Nakashima, Yasuhiko
    2022 IEEE 35TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (IEEE SOCC 2022), 2022, : 261 - 266
  • [25] Low Precision Quantization-aware Training in Spiking Neural Networks with Differentiable Quantization Function
    Shymyrbay, Ayan
    Fouda, Mohammed E.
    Eltawil, Ahmed
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [26] SQUAT: Stateful Quantization-Aware Training in Recurrent Spiking Neural Networks
    Venkatesh, Sreyes
    Marinescu, Razvan
    Eshraghian, Jason K.
    2024 NEURO INSPIRED COMPUTATIONAL ELEMENTS CONFERENCE, NICE, 2024,
  • [27] FIXAR: A Fixed-Point Deep Reinforcement Learning Platform with Quantization-Aware Training and Adaptive Parallelism
    Yang, Je
    Hong, Seongmin
    Kim, Joo-Young
    2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 259 - 264
  • [28] Quantization-Aware In-situ Training for Reliable and Accurate Edge AI
    de Lima, Joao Paulo C.
    Carro, Luigi
    PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 1497 - 1502
  • [29] Reducing energy dissipation of frame memory by adaptive bit-width compression
    Moshnyaga, VG
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2002, 12 (08) : 713 - 718
  • [30] Reducing energy dissipation of video memory by adaptive bit-width compression
    Moshnyaga, VG
    ICES 2002: 9TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS I-111, CONFERENCE PROCEEDINGS, 2002, : 831 - 834