4-bit Conformer with Native Quantization Aware Training for Speech Recognition

被引:3
|
作者
Ding, Shaojin [1 ]
Meadowlark, Phoenix [1 ]
He, Yanzhang [1 ]
Lew, Lukasz [1 ]
Agrawal, Shivani [1 ]
Rybakov, Oleg [1 ]
机构
[1] Google LLC, Mountain View, CA 94043 USA
来源
关键词
speech recognition; model quantization; 4-bit quantization;
D O I
10.21437/Interspeech.2022-10809
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model quantization has become an increasingly popular approach to compress neural networks and reduce computation cost. Most of the existing practical ASR systems apply post-training 8-bit quantization. To achieve a higher compression rate without introducing additional performance regression, in this study, we propose to develop 4-bit ASR models with native quantization aware training, which leverages native integer operations to effectively optimize both training and inference. We conducted two experiments on state-of-the-art Conformer-based ASR models to evaluate our proposed quantization technique. First, we explored the impact of different precisions for both weight and activation quantization on the LibriSpeech dataset, and obtained a lossless 4-bit Conformer model with 7.7x size reduction compared to the float32 model. Following this, we for the first time investigated and revealed the viability of 4-bit quantization on a practical ASR system that is trained with large-scale datasets, and produced a lossless Conformer ASR model with mixed 4bit and 8-bit weights that has 5x size reduction compared to the float32 model.
引用
收藏
页码:1711 / 1715
页数:5
相关论文
共 36 条
  • [21] Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus
    Xu, Junhao
    Hu, Shoukang
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2022, 2022, : 2128 - 2132
  • [22] SUB-8-BIT QUANTIZATION FOR ON-DEVICE SPEECH RECOGNITION: A REGULARIZATION-FREE APPROACH
    Zhen, Kai
    Radfar, Martin
    Nguyen, Hieu
    Strimel, Grant P.
    Susanj, Nathan
    Mouchtaris, Athanasios
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 15 - 22
  • [23] Knowledge-guided quantization-aware training for EEG-based emotion recognition
    Zhong, Sheng-hua
    Shi, Jiahao
    Wang, Yi
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 108
  • [24] Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
    Fasoli, Andrea
    Chen, Chia-Yu
    Serrano, Mauricio
    Venkataramani, Swagath
    Saon, George
    Cui, Xiaodong
    Kingsbury, Brian
    Gopalakrishnan, Kailash
    INTERSPEECH 2022, 2022, : 2038 - 2042
  • [25] A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm
    Keller, Ben
    Venkatesan, Rangharajan
    Dai, Steve
    Tell, Stephen G.
    Zimmer, Brian
    Sakr, Charbel
    Dally, William J.
    Gray, C. Thomas
    Khailany, Brucek
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2023, 58 (04) : 1129 - 1141
  • [26] O-2A: Outlier-Aware Compression for 8-bit Post-Training Quantization Model
    Ho, Nguyen-Dong
    Chang, Ik-Joon
    IEEE ACCESS, 2023, 11 : 95467 - 95480
  • [27] Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition
    Gu, Yue
    Du, Zhihao
    Zhang, Shiliang
    Chen, Qian
    Han, Jiqing
    INTERSPEECH 2023, 2023, : 1249 - 1253
  • [28] Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition
    Qian, Yanmin
    Tan, Tian
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) : 2231 - 2240
  • [29] Relevant Phonetic-aware Neural Acoustic Models using Native English and Japanese Speech for Japanese-English Automatic Speech Recognition
    Masumura, Ryo
    Kabashima, Suguru
    Moriya, Takafumi
    Kobashikawa, Satoshi
    Yamaguchi, Yoshikazu
    Aono, Yushi
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1435 - 1439
  • [30] Discrimination of all types of 4-bit optical code by optical time-gating and designed label recognition filter in label recognition using optical correlation
    Furukawa, H
    Konishi, T
    Itoh, K
    Wada, N
    Miyazaki, T
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2005, E88B (10) : 3841 - 3847