Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance

被引:29
|
作者
Tang, Chen [1 ]
Ouyang, Kai [1 ]
Wang, Zhi [1 ,4 ]
Zhu, Yifei [2 ]
Ji, Wen [3 ,4 ]
Wang, Yaowei [4 ]
Zhu, Wenwu [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[3] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[4] Peng Cheng Lab, Shenzhen, Peoples R China
来源
基金
北京市自然科学基金;
关键词
Mixed-precision quantization; Model compression;
D O I
10.1007/978-3-031-20083-0_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The exponentially large discrete search space in mixed-precision quantization (MPQ) makes it hard to determine the optimal bit-width for each layer. Previous works usually resort to iterative search methods on the training set, which consume hundreds or even thousands of GPU-hours. In this study, we reveal that some unique learnable parameters in quantization, namely the scale factors in the quantizer, can serve as importance indicators of a layer, reflecting the contribution of that layer to the final accuracy at certain bit-widths. These importance indicators naturally perceive the numerical transformation during quantization-aware training, which can precisely provide quantization sensitivity metrics of layers. However, a deep network always contains hundreds of such indicators, and training them one by one would lead to an excessive time cost. To overcome this issue, we propose a joint training scheme that can obtain all indicators at once. It considerably speeds up the indicators training process by parallelizing the original sequential training processes. With these learned importance indicators, we formulate the MPQ search problem as a one-time integer linear programming (ILP) problem. That avoids the iterative search and significantly reduces search time without limiting the bit-width search space. For example, MPQ search on ResNet18 with our indicators takes only 0.06 s, which improves time efficiency exponentially compared to iterative search methods. Also, extensive experiments show our approach can achieve SOTA accuracy on ImageNet for far-ranging models with various constraints (e.g., BitOps, compress rate).
引用
收藏
页码:259 / 275
页数:17
相关论文
共 50 条
  • [21] Mixed-precision quantization-aware training for photonic neural networks
    Kirtas, Manos
    Passalis, Nikolaos
    Oikonomou, Athina
    Moralis-Pegios, Miltos
    Giamougiannis, George
    Tsakyridis, Apostolos
    Mourgias-Alexandris, George
    Pleros, Nikolaos
    Tefas, Anastasios
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21361 - 21379
  • [22] Mixed-precision quantization-aware training for photonic neural networks
    Manos Kirtas
    Nikolaos Passalis
    Athina Oikonomou
    Miltos Moralis-Pegios
    George Giamougiannis
    Apostolos Tsakyridis
    George Mourgias-Alexandris
    Nikolaos Pleros
    Anastasios Tefas
    Neural Computing and Applications, 2023, 35 : 21361 - 21379
  • [23] Mixed-precision quantization for neural networks based on error limit (Invited)
    Li Y.
    Guo Z.
    Liu K.
    Sun X.
    Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2022, 51 (04):
  • [24] Accuracy degradation aware bit rate allocation for layer-wise uniform quantization of weights in neural network
    Nikolic, Jelena
    Tomic, Stefan
    Peric, Zoran
    Jovanovic, Aleksandra
    Aleksic, Danijela
    JOURNAL OF ELECTRICAL ENGINEERING-ELEKTROTECHNICKY CASOPIS, 2024, 75 (06): : 425 - 434
  • [25] Mixed-precision Quantization with Dynamical Hessian Matrix for Object Detection Network
    Yang, Zerui
    Fei, Wen
    Dai, Wenrui
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [26] Post-training deep neural network pruning via layer-wise calibration
    Lazarevich, Ivan
    Kozlov, Alexander
    Malinin, Nikita
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 798 - 805
  • [27] Entropy-Driven Mixed-Precision Quantization for Deep Network Design
    Sun, Zhenhong
    Ge, Ce
    Wang, Junyan
    Lin, Ming
    Chen, Hesen
    Li, Hao
    Sun, Xiuyu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [28] Network with Sub-networks: Layer-wise Detachable Neural Network
    Fuengfusin, Ninnart
    Tamukoh, Hakaru
    JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2021, 7 (04): : 240 - 244
  • [29] Craft Distillation: Layer-wise Convolutional Neural Network Distillation
    Blakeney, Cody
    Li, Xiaomin
    Yan, Yan
    Zong, Ziliang
    2020 7TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (CSCLOUD 2020)/2020 6TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (EDGECOM 2020), 2020, : 252 - 257
  • [30] New layer-wise linearized algorithm for feedforward neural network
    Tian, Chuan-Jun
    Wei, Gang
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2001, 29 (11): : 1495 - 1498