Accurate Post Training Quantization With Small Calibration Sets

被引:0
|
作者
Hubara, Itay [1 ,2 ]
Nahshan, Yury [1 ]
Hanani, Yair [1 ]
Banner, Ron [1 ]
Soudry, Daniel [2 ]
机构
[1] Habana Labs, Caesarea, Israel
[2] Technion, Dept Elect Engn, Haifa, Israel
基金
以色列科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lately, post-training quantization methods have gained considerable attention, as they are simple to use, and require only a small unlabeled calibration set. This small dataset cannot be used to fine-tune the model without significant over-fitting. Instead, these methods only use the calibration set to set the activations' dynamic ranges. However, such methods always resulted in significant accuracy degradation, when used below 8-bits (except on small datasets). Here we aim to break the 8-bit barrier. To this end, we minimize the quantization errors of each layer or block separately by optimizing its parameters over the calibration set. We empirically demonstrate that this approach is: (1) much less susceptible to over-fitting than the standard fine-tuning approaches, and can be used even on a very small calibration set; and (2) more powerful than previous methods, which only set the activations' dynamic ranges. We suggest two flavors for our method, parallel and sequential aim for a fixed and flexible bit-width allocation. For the latter, we demonstrate how to optimally allocate the bit-widths for each layer, while constraining accuracy degradation or model compression by proposing a novel integer programming formulation. Finally, we suggest model global statistics tuning, to correct biases introduced during quantization. Together, these methods yield state-of-the-art results for both vision and text models. For instance, on ResNet50, we obtain less than 1% accuracy degradation - with 4-bit weights and activations in all layers, but first and last. The suggested methods are two orders of magnitude faster than the traditional Quantize Aware Training approach used for lower than 8-bit quantization. We open-sourced our code https://github.com/papers-submission/CalibTIP.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Improved kNN Rule for Small Training Sets
    Cheamanunkul, Sunsern
    Freund, Yoav
    2014 13TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2014, : 201 - 206
  • [22] Post-training Quantization of Deep Neural Network Weights
    Khayrov, E. M.
    Malsagov, M. Yu.
    Karandashev, I. M.
    ADVANCES IN NEURAL COMPUTATION, MACHINE LEARNING, AND COGNITIVE RESEARCH III, 2020, 856 : 230 - 238
  • [23] Normalized Post-training Quantization for Photonic Neural Networks
    Kirtas, M.
    Passalis, N.
    Oikonomou, A.
    Mourgias-Alexandris, G.
    Moralis-Pegios, M.
    Pleros, N.
    Tefas, A.
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 657 - 663
  • [24] Post-training Quantization for Neural Networks with Provable Guarantees*
    Zhang, Jinjie
    Zhou, Yixuan
    Saab, Rayan
    SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2023, 5 (02): : 373 - 399
  • [25] POST-TRAINING QUANTIZATION FOR VISION TRANSFORMER IN TRANSFORMED DOMAIN
    Feng, Kai
    Chen, Zhuo
    Gao, Fei
    Wang, Zhe
    Xu, Long
    Lin, Weisi
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1457 - 1462
  • [26] Post-training Quantization Methods for Deep Learning Models
    Kluska, Piotr
    Zieba, Maciej
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT I, 2020, 12033 : 467 - 479
  • [27] Leveraging Inter-Layer Dependency for Post -Training Quantization
    Wang, Changbao
    Zheng, Dandan
    Liu, Yuanliu
    Li, Liang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [28] MATAR: Multi-Quantization-Aware Training for Accurate and Fast Hardware Retargeting
    Mori, Pierpaolo
    Thoma, Moritz
    Frickenstein, Lukas
    Sampath, Shambhavi Balamuthu
    Fasfous, Nael
    Vemparala, Manoj Rohit
    Frickenstein, Alexander
    Stechele, Walter
    Mueller-Gritschneder, Daniel
    Passerone, Claudio
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [29] Quantization-Aware In-situ Training for Reliable and Accurate Edge AI
    de Lima, Joao Paulo C.
    Carro, Luigi
    PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 1497 - 1502
  • [30] Inverse vs. classical calibration for small data sets
    Tellinghuisen, J
    FRESENIUS JOURNAL OF ANALYTICAL CHEMISTRY, 2000, 368 (06): : 585 - 588