Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance

被引：29

作者：

Tang, Chen ^{[1
]}

Ouyang, Kai ^{[1
]}

Wang, Zhi ^{[1
,4
]}

Zhu, Yifei ^{[2
]}

Ji, Wen ^{[3
,4
]}

Wang, Yaowei ^{[4
]}

Zhu, Wenwu ^{[1
]}

机构：

[1] Tsinghua Univ, Beijing, Peoples R China

[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[3] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China

[4] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

COMPUTER VISION, ECCV 2022, PT XI | 2022年 / 13671卷

基金：

北京市自然科学基金;

关键词：

Mixed-precision quantization; Model compression;

D O I：

10.1007/978-3-031-20083-0_16

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The exponentially large discrete search space in mixed-precision quantization (MPQ) makes it hard to determine the optimal bit-width for each layer. Previous works usually resort to iterative search methods on the training set, which consume hundreds or even thousands of GPU-hours. In this study, we reveal that some unique learnable parameters in quantization, namely the scale factors in the quantizer, can serve as importance indicators of a layer, reflecting the contribution of that layer to the final accuracy at certain bit-widths. These importance indicators naturally perceive the numerical transformation during quantization-aware training, which can precisely provide quantization sensitivity metrics of layers. However, a deep network always contains hundreds of such indicators, and training them one by one would lead to an excessive time cost. To overcome this issue, we propose a joint training scheme that can obtain all indicators at once. It considerably speeds up the indicators training process by parallelizing the original sequential training processes. With these learned importance indicators, we formulate the MPQ search problem as a one-time integer linear programming (ILP) problem. That avoids the iterative search and significantly reduces search time without limiting the bit-width search space. For example, MPQ search on ResNet18 with our indicators takes only 0.06 s, which improves time efficiency exponentially compared to iterative search methods. Also, extensive experiments show our approach can achieve SOTA accuracy on ImageNet for far-ranging models with various constraints (e.g., BitOps, compress rate).

引用

页码：259 / 275

页数：17

共 50 条

[21] Mixed-precision quantization-aware training for photonic neural networks
Kirtas, Manos
Passalis, Nikolaos
Oikonomou, Athina
Moralis-Pegios, Miltos
Giamougiannis, George
Tsakyridis, Apostolos
Mourgias-Alexandris, George
Pleros, Nikolaos
Tefas, Anastasios
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (29): : 21361 - 21379
[22] Mixed-precision quantization-aware training for photonic neural networks
Manos Kirtas
Nikolaos Passalis
Athina Oikonomou
Miltos Moralis-Pegios
George Giamougiannis
Apostolos Tsakyridis
George Mourgias-Alexandris
Nikolaos Pleros
Anastasios Tefas
Neural Computing and Applications, 2023, 35 : 21361 - 21379
[23] Mixed-precision quantization for neural networks based on error limit (Invited)
Li Y.
Guo Z.
Liu K.
Sun X.
Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2022, 51 (04):
[24] Accuracy degradation aware bit rate allocation for layer-wise uniform quantization of weights in neural network
Nikolic, Jelena
Tomic, Stefan
Peric, Zoran
Jovanovic, Aleksandra
Aleksic, Danijela
JOURNAL OF ELECTRICAL ENGINEERING-ELEKTROTECHNICKY CASOPIS, 2024, 75 (06): : 425 - 434
[25] Mixed-precision Quantization with Dynamical Hessian Matrix for Object Detection Network
Yang, Zerui
Fei, Wen
Dai, Wenrui
Li, Chenglin
Zou, Junni
Xiong, Hongkai
2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
[26] Post-training deep neural network pruning via layer-wise calibration
Lazarevich, Ivan
Kozlov, Alexander
Malinin, Nikita
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 798 - 805
[27] Entropy-Driven Mixed-Precision Quantization for Deep Network Design
Sun, Zhenhong
Ge, Ce
Wang, Junyan
Lin, Ming
Chen, Hesen
Li, Hao
Sun, Xiuyu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[28] Network with Sub-networks: Layer-wise Detachable Neural Network
Fuengfusin, Ninnart
Tamukoh, Hakaru
JOURNAL OF ROBOTICS NETWORKING AND ARTIFICIAL LIFE, 2021, 7 (04): : 240 - 244
[29] Craft Distillation: Layer-wise Convolutional Neural Network Distillation
Blakeney, Cody
Li, Xiaomin
Yan, Yan
Zong, Ziliang
2020 7TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (CSCLOUD 2020)/2020 6TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (EDGECOM 2020), 2020, : 252 - 257
[30] New layer-wise linearized algorithm for feedforward neural network
Tian, Chuan-Jun
Wei, Gang
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2001, 29 (11): : 1495 - 1498

← 1 2 3 4 5 →