4-bit Conformer with Native Quantization Aware Training for Speech Recognition

被引：3

作者：

Ding, Shaojin ^{[1
]}

Meadowlark, Phoenix ^{[1
]}

He, Yanzhang ^{[1
]}

Lew, Lukasz ^{[1
]}

Agrawal, Shivani ^{[1
]}

Rybakov, Oleg ^{[1
]}

机构：

[1] Google LLC, Mountain View, CA 94043 USA

来源：

INTERSPEECH 2022 | 2022年

关键词：

speech recognition; model quantization; 4-bit quantization;

D O I：

10.21437/Interspeech.2022-10809

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model quantization has become an increasingly popular approach to compress neural networks and reduce computation cost. Most of the existing practical ASR systems apply post-training 8-bit quantization. To achieve a higher compression rate without introducing additional performance regression, in this study, we propose to develop 4-bit ASR models with native quantization aware training, which leverages native integer operations to effectively optimize both training and inference. We conducted two experiments on state-of-the-art Conformer-based ASR models to evaluate our proposed quantization technique. First, we explored the impact of different precisions for both weight and activation quantization on the LibriSpeech dataset, and obtained a lossless 4-bit Conformer model with 7.7x size reduction compared to the float32 model. Following this, we for the first time investigated and revealed the viability of 4-bit quantization on a practical ASR system that is trained with large-scale datasets, and produced a lossless Conformer ASR model with mixed 4bit and 8-bit weights that has 5x size reduction compared to the float32 model.

引用

页码：1711 / 1715

页数：5

共 36 条

[31] The training of Slovak speech recognition system based on SPHINX 4 for GSM networks
Vojtko, Juraj
Kacur, Juraj
Rozinaj, Gregor
PROCEEDINGS ELMAR 2007, 2007, : 147 - 150
[32] SCENARIO AWARE SPEECH RECOGNITION: ADVANCEMENTS FOR APOLLO FEARLESS STEPS & CHIME-4 CORPORA
Chen, Szu-Jui
Xia, Wei
Hansen, John H. L.
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 289 - 295
[33] Phonological feature-based speech recognition system for pronunciation training in non-native language learning
Arora, Vipul
Lahiri, Aditi
Reetz, Henning
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (01): : 98 - 108
[34] Phonological feature-based speech recognition system for pronunciation training in non-native language learning
1600, Acoustical Society of America (143):
[35] Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction
Abe, Akihiro
Yamamoto, Kazumasa
Nakagawa, Seiichi
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2849 - 2853
[36] SPEAKER-AWARE TRAINING OF ATTENTION-BASED END-TO-END SPEECH RECOGNITION USING NEURAL SPEAKER EMBEDDINGS
Rouhe, Aku
Kaseva, Tuomas
Kurimo, Mikko
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7064 - 7068

← 1 2 3 4 →