2-bit Conformer quantization for automatic speech recognition

被引：0

作者：

Rybakov, Oleg ^{[1
]}

Meadowlark, Phoenix ^{[1
]}

Ding, Shaojin ^{[1
]}

Qiu, David ^{[1
]}

Li, Jian ^{[1
]}

Rim, David ^{[1
]}

He, Yanzhang ^{[1
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

来源：

INTERSPEECH 2023 | 2023年

关键词：

speech recognition; model quantization; low-bit quantization;

D O I：

10.21437/Interspeech.2023-1012

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Large speech models are rapidly gaining traction in research community. As a result, model compression has become an important topic, so that these models can fit in memory and be served with reduced cost. Practical approaches for compressing automatic speech recognition (ASR) model use int8 or int4 weight quantization. In this study, we propose to develop 2-bit ASR models. We explore the impact of symmetric and asymmetric quantization combined with sub-channel quantization and clipping on both LibriSpeech dataset and large-scale training data. We obtain a lossless 2-bit Conformer model with 32% model size reduction when compared to state of the art 4-bit Conformer model for LibriSpeech. With the large-scale training data, we obtain a 2-bit Conformer model with over 40% model size reduction against the 4-bit version at the cost of 17% relative word error rate degradation.

引用

页码：4908 / 4912

页数：5

共 50 条

[31] 2-BIT MICROCOMPUTER FOR EDUCATIONAL USE
KANG, RI
SHONO, K
MICROPROCESSORS AND MICROSYSTEMS, 1991, 15 (06) : 299 - 304
[32] Conformer: Convolution-augmented Transformer for Speech Recognition
Gulati, Anmol
Qin, James
Chiu, Chung-Cheng
Parmar, Niki
Zhang, Yu
Yu, Jiahui
Han, Wei
Wang, Shibo
Zhang, Zhengdong
Wu, Yonghui
Pang, Ruoming
INTERSPEECH 2020, 2020, : 5036 - 5040
[33] DFTerNet: Towards 2-bit Dynamic Fusion Networks for Accurate Human Activity Recognition
Yang, Zhan
Raymond, Osolo Ian
Zhang, Chengyuan
Wan, Ying
Long, Jun
IEEE ACCESS, 2018, 6 : 56750 - 56764
[34] Speech Recognition of Accented Mandarin Based on Improved Conformer
Yang, Xing-Yao
Zhang, Shao-Dong
Xiao, Rui
Yu, Jiong
Li, Zi-Yang
SENSORS, 2023, 23 (08)
[35] Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition
Xu, Junhao
Yu, Jianwei
Hu, Shoukang
Liu, Xunying
Meng, Helen
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3679 - 3693
[36] A Novel Approach for Vietnamese Speech Recognition Using Conformer
Tuan, Nguyen Van Anh
Hoa, Nguyen Thi Thanh
Dat, Nguyen Thanh
Tuan, Pham Minh
Truong, Dao Duy
Phuc, Dang Thi
FUTURE DATA AND SECURITY ENGINEERING. BIG DATA, SECURITY AND PRIVACY, SMART CITY AND INDUSTRY 4.0 APPLICATIONS, FDSE 2022, 2022, 1688 : 723 - 730
[37] 2 USEFUL TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION
COKER, CH
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S179 - S179
[38] Automatic speech recognition
O'Shaughnessy, Douglas
2015 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON), 2015, : 417 - 424
[39] Vector-Quantization based Mask Estimation for Missing Data Automatic Speech Recognition
Van Segbroeck, Maarten
Van Hamme, Hugo
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1825 - 1828
[40] AUTOMATIC SPEECH RECOGNITION
IVALL, T
ELECTRONICS & WIRELESS WORLD, 1984, 90 (1581): : 73 - 76

← 1 2 3 4 5 →