Accurate Post Training Quantization With Small Calibration Sets

被引:0
|
作者
Hubara, Itay [1 ,2 ]
Nahshan, Yury [1 ]
Hanani, Yair [1 ]
Banner, Ron [1 ]
Soudry, Daniel [2 ]
机构
[1] Habana Labs, Caesarea, Israel
[2] Technion, Dept Elect Engn, Haifa, Israel
基金
以色列科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Lately, post-training quantization methods have gained considerable attention, as they are simple to use, and require only a small unlabeled calibration set. This small dataset cannot be used to fine-tune the model without significant over-fitting. Instead, these methods only use the calibration set to set the activations' dynamic ranges. However, such methods always resulted in significant accuracy degradation, when used below 8-bits (except on small datasets). Here we aim to break the 8-bit barrier. To this end, we minimize the quantization errors of each layer or block separately by optimizing its parameters over the calibration set. We empirically demonstrate that this approach is: (1) much less susceptible to over-fitting than the standard fine-tuning approaches, and can be used even on a very small calibration set; and (2) more powerful than previous methods, which only set the activations' dynamic ranges. We suggest two flavors for our method, parallel and sequential aim for a fixed and flexible bit-width allocation. For the latter, we demonstrate how to optimally allocate the bit-widths for each layer, while constraining accuracy degradation or model compression by proposing a novel integer programming formulation. Finally, we suggest model global statistics tuning, to correct biases introduced during quantization. Together, these methods yield state-of-the-art results for both vision and text models. For instance, on ResNet50, we obtain less than 1% accuracy degradation - with 4-bit weights and activations in all layers, but first and last. The suggested methods are two orders of magnitude faster than the traditional Quantize Aware Training approach used for lower than 8-bit quantization. We open-sourced our code https://github.com/papers-submission/CalibTIP.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Towards accurate post-training quantization for reparameterized models
    Zhang, Luoming
    He, Yefei
    Fei, Wen
    Lou, Zhenyu
    Wu, Weijia
    Ying, Yangwei
    Zhou, Hong
    APPLIED INTELLIGENCE, 2025, 55 (07)
  • [2] Towards Accurate Post-Training Quantization for Vision Transformer
    Ding, Yifu
    Qin, Haotong
    Yan, Qinghua
    Chai, Zhenhua
    Liu, Junjie
    Wei, Xiaolin
    Liu, Xianglong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5380 - 5388
  • [3] PTQD: Accurate Post-Training Quantization for Diffusion Models
    He, Yefei
    Liu, Luping
    Liu, Jing
    Wu, Weijia
    Zhou, Hong
    Zhuang, Bohan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Non-uniform Step Size Quantization for Accurate Post-training Quantization
    Oh, Sangyun
    Sim, Hyeonuk
    Kim, Jounghyun
    Lee, Jongeun
    COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 658 - 673
  • [5] Toward Accurate Post-Training Quantization for Image Super Resolution
    Tu, Zhijun
    Hu, Jie
    Chen, Hanting
    Wang, Yunhe
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5856 - 5865
  • [6] Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
    Frantar, Elias
    Singh, Sidak Pal
    Alistarh, Dan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [7] Towards Accurate Post-Training Quantization of Vision Transformers via Error Reduction
    Zhong, Yunshan
    Huang, You
    Hu, Jiawei
    Zhang, Yuxin
    Ji, Rongrong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2676 - 2692
  • [8] EfficientQ: An efficient and accurate post-training neural network quantization method for medical image segmentation
    Zhang, Rongzhao
    Chung, Albert C. S.
    MEDICAL IMAGE ANALYSIS, 2024, 97
  • [9] Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models
    Tang, Siao
    Wang, Xin
    Chen, Hong
    Guan, Chaoyu
    Wu, Zewen
    Tang, Yansong
    Zhu, Wenwu
    COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 : 404 - 420
  • [10] Loss aware post-training quantization
    Yury Nahshan
    Brian Chmiel
    Chaim Baskin
    Evgenii Zheltonozhskii
    Ron Banner
    Alex M. Bronstein
    Avi Mendelson
    Machine Learning, 2021, 110 : 3245 - 3262