4-bit Conformer with Native Quantization Aware Training for Speech Recognition

被引:3
|
作者
Ding, Shaojin [1 ]
Meadowlark, Phoenix [1 ]
He, Yanzhang [1 ]
Lew, Lukasz [1 ]
Agrawal, Shivani [1 ]
Rybakov, Oleg [1 ]
机构
[1] Google LLC, Mountain View, CA 94043 USA
来源
关键词
speech recognition; model quantization; 4-bit quantization;
D O I
10.21437/Interspeech.2022-10809
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model quantization has become an increasingly popular approach to compress neural networks and reduce computation cost. Most of the existing practical ASR systems apply post-training 8-bit quantization. To achieve a higher compression rate without introducing additional performance regression, in this study, we propose to develop 4-bit ASR models with native quantization aware training, which leverages native integer operations to effectively optimize both training and inference. We conducted two experiments on state-of-the-art Conformer-based ASR models to evaluate our proposed quantization technique. First, we explored the impact of different precisions for both weight and activation quantization on the LibriSpeech dataset, and obtained a lossless 4-bit Conformer model with 7.7x size reduction compared to the float32 model. Following this, we for the first time investigated and revealed the viability of 4-bit quantization on a practical ASR system that is trained with large-scale datasets, and produced a lossless Conformer ASR model with mixed 4bit and 8-bit weights that has 5x size reduction compared to the float32 model.
引用
收藏
页码:1711 / 1715
页数:5
相关论文
共 36 条
  • [31] The training of Slovak speech recognition system based on SPHINX 4 for GSM networks
    Vojtko, Juraj
    Kacur, Juraj
    Rozinaj, Gregor
    PROCEEDINGS ELMAR 2007, 2007, : 147 - 150
  • [32] SCENARIO AWARE SPEECH RECOGNITION: ADVANCEMENTS FOR APOLLO FEARLESS STEPS & CHIME-4 CORPORA
    Chen, Szu-Jui
    Xia, Wei
    Hansen, John H. L.
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 289 - 295
  • [33] Phonological feature-based speech recognition system for pronunciation training in non-native language learning
    Arora, Vipul
    Lahiri, Aditi
    Reetz, Henning
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (01): : 98 - 108
  • [35] Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction
    Abe, Akihiro
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2849 - 2853
  • [36] SPEAKER-AWARE TRAINING OF ATTENTION-BASED END-TO-END SPEECH RECOGNITION USING NEURAL SPEAKER EMBEDDINGS
    Rouhe, Aku
    Kaseva, Tuomas
    Kurimo, Mikko
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7064 - 7068