2-bit Conformer quantization for automatic speech recognition

被引:0
|
作者
Rybakov, Oleg [1 ]
Meadowlark, Phoenix [1 ]
Ding, Shaojin [1 ]
Qiu, David [1 ]
Li, Jian [1 ]
Rim, David [1 ]
He, Yanzhang [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
来源
关键词
speech recognition; model quantization; low-bit quantization;
D O I
10.21437/Interspeech.2023-1012
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Large speech models are rapidly gaining traction in research community. As a result, model compression has become an important topic, so that these models can fit in memory and be served with reduced cost. Practical approaches for compressing automatic speech recognition (ASR) model use int8 or int4 weight quantization. In this study, we propose to develop 2-bit ASR models. We explore the impact of symmetric and asymmetric quantization combined with sub-channel quantization and clipping on both LibriSpeech dataset and large-scale training data. We obtain a lossless 2-bit Conformer model with 32% model size reduction when compared to state of the art 4-bit Conformer model for LibriSpeech. With the large-scale training data, we obtain a 2-bit Conformer model with over 40% model size reduction against the 4-bit version at the cost of 17% relative word error rate degradation.
引用
收藏
页码:4908 / 4912
页数:5
相关论文
共 50 条
  • [31] 2-BIT MICROCOMPUTER FOR EDUCATIONAL USE
    KANG, RI
    SHONO, K
    MICROPROCESSORS AND MICROSYSTEMS, 1991, 15 (06) : 299 - 304
  • [32] Conformer: Convolution-augmented Transformer for Speech Recognition
    Gulati, Anmol
    Qin, James
    Chiu, Chung-Cheng
    Parmar, Niki
    Zhang, Yu
    Yu, Jiahui
    Han, Wei
    Wang, Shibo
    Zhang, Zhengdong
    Wu, Yonghui
    Pang, Ruoming
    INTERSPEECH 2020, 2020, : 5036 - 5040
  • [33] DFTerNet: Towards 2-bit Dynamic Fusion Networks for Accurate Human Activity Recognition
    Yang, Zhan
    Raymond, Osolo Ian
    Zhang, Chengyuan
    Wan, Ying
    Long, Jun
    IEEE ACCESS, 2018, 6 : 56750 - 56764
  • [34] Speech Recognition of Accented Mandarin Based on Improved Conformer
    Yang, Xing-Yao
    Zhang, Shao-Dong
    Xiao, Rui
    Yu, Jiong
    Li, Zi-Yang
    SENSORS, 2023, 23 (08)
  • [35] Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition
    Xu, Junhao
    Yu, Jianwei
    Hu, Shoukang
    Liu, Xunying
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3679 - 3693
  • [36] A Novel Approach for Vietnamese Speech Recognition Using Conformer
    Tuan, Nguyen Van Anh
    Hoa, Nguyen Thi Thanh
    Dat, Nguyen Thanh
    Tuan, Pham Minh
    Truong, Dao Duy
    Phuc, Dang Thi
    FUTURE DATA AND SECURITY ENGINEERING. BIG DATA, SECURITY AND PRIVACY, SMART CITY AND INDUSTRY 4.0 APPLICATIONS, FDSE 2022, 2022, 1688 : 723 - 730
  • [37] 2 USEFUL TECHNIQUES FOR AUTOMATIC SPEECH RECOGNITION
    COKER, CH
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S179 - S179
  • [38] Automatic speech recognition
    O'Shaughnessy, Douglas
    2015 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON), 2015, : 417 - 424
  • [39] Vector-Quantization based Mask Estimation for Missing Data Automatic Speech Recognition
    Van Segbroeck, Maarten
    Van Hamme, Hugo
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1825 - 1828
  • [40] AUTOMATIC SPEECH RECOGNITION
    IVALL, T
    ELECTRONICS & WIRELESS WORLD, 1984, 90 (1581): : 73 - 76