4-bit Conformer with Native Quantization Aware Training for Speech Recognition

被引：3

作者：

Ding, Shaojin ^{[1
]}

Meadowlark, Phoenix ^{[1
]}

He, Yanzhang ^{[1
]}

Lew, Lukasz ^{[1
]}

Agrawal, Shivani ^{[1
]}

Rybakov, Oleg ^{[1
]}

机构：

[1] Google LLC, Mountain View, CA 94043 USA

来源：

INTERSPEECH 2022 | 2022年

关键词：

speech recognition; model quantization; 4-bit quantization;

D O I：

10.21437/Interspeech.2022-10809

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model quantization has become an increasingly popular approach to compress neural networks and reduce computation cost. Most of the existing practical ASR systems apply post-training 8-bit quantization. To achieve a higher compression rate without introducing additional performance regression, in this study, we propose to develop 4-bit ASR models with native quantization aware training, which leverages native integer operations to effectively optimize both training and inference. We conducted two experiments on state-of-the-art Conformer-based ASR models to evaluate our proposed quantization technique. First, we explored the impact of different precisions for both weight and activation quantization on the LibriSpeech dataset, and obtained a lossless 4-bit Conformer model with 7.7x size reduction compared to the float32 model. Following this, we for the first time investigated and revealed the viability of 4-bit quantization on a practical ASR system that is trained with large-scale datasets, and produced a lossless Conformer ASR model with mixed 4bit and 8-bit weights that has 5x size reduction compared to the float32 model.

引用

页码：1711 / 1715

页数：5

共 36 条

[21] Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus
Xu, Junhao
Hu, Shoukang
Liu, Xunying
Meng, Helen
INTERSPEECH 2022, 2022, : 2128 - 2132
[22] SUB-8-BIT QUANTIZATION FOR ON-DEVICE SPEECH RECOGNITION: A REGULARIZATION-FREE APPROACH
Zhen, Kai
Radfar, Martin
Nguyen, Hieu
Strimel, Grant P.
Susanj, Nathan
Mouchtaris, Athanasios
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 15 - 22
[23] Knowledge-guided quantization-aware training for EEG-based emotion recognition
Zhong, Sheng-hua
Shi, Jiahao
Wang, Yi
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 108
[24] Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Fasoli, Andrea
Chen, Chia-Yu
Serrano, Mauricio
Venkataramani, Swagath
Saon, George
Cui, Xiaodong
Kingsbury, Brian
Gopalakrishnan, Kailash
INTERSPEECH 2022, 2022, : 2038 - 2042
[25] A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm
Keller, Ben
Venkatesan, Rangharajan
Dai, Steve
Tell, Stephen G.
Zimmer, Brian
Sakr, Charbel
Dally, William J.
Gray, C. Thomas
Khailany, Brucek
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2023, 58 (04) : 1129 - 1141
[26] O-2A: Outlier-Aware Compression for 8-bit Post-Training Quantization Model
Ho, Nguyen-Dong
Chang, Ik-Joon
IEEE ACCESS, 2023, 11 : 95467 - 95480
[27] Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition
Gu, Yue
Du, Zhihao
Zhang, Shiliang
Chen, Qian
Han, Jiqing
INTERSPEECH 2023, 2023, : 1249 - 1253
[28] Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition
Qian, Yanmin
Tan, Tian
Yu, Dong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) : 2231 - 2240
[29] Relevant Phonetic-aware Neural Acoustic Models using Native English and Japanese Speech for Japanese-English Automatic Speech Recognition
Masumura, Ryo
Kabashima, Suguru
Moriya, Takafumi
Kobashikawa, Satoshi
Yamaguchi, Yoshikazu
Aono, Yushi
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1435 - 1439
[30] Discrimination of all types of 4-bit optical code by optical time-gating and designed label recognition filter in label recognition using optical correlation
Furukawa, H
Konishi, T
Itoh, K
Wada, N
Miyazaki, T
IEICE TRANSACTIONS ON COMMUNICATIONS, 2005, E88B (10) : 3841 - 3847

← 1 2 3 4 →