A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION

被引:20
|
作者
Zhu, Qiu-Shi [1 ]
Zhang, Jie [1 ,2 ]
Zhang, Zi-Qiang [1 ]
Wu, Ming-Hui [1 ]
Fang, Xin [1 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China USTC, NEL SLIP, Hefei, Peoples R China
[2] Chinese Acad Sci, Inst Acoust, State Key Lab Acoust, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Wav2vec2.0; speech recognition; noise robustness; self-supervised pre-training; speech representation;
D O I
10.1109/ICASSP43922.2022.9747379
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Wav2vec2.0 is a popular self-supervised pre-training framework for learning speech representations in the context of automatic speech recognition (ASR). It was shown that wav2vec2.0 has a good robustness against the domain shift, while the noise robustness is still unclear. In this work, we therefore first analyze the noise robustness of wav2vec2.0 via experiments. We observe that wav2vec2.0 pre-trained on noisy data can obtain good representations and thus improve the ASR performance on the noisy test set, which however brings a performance degradation on the clean test set. To avoid this issue, in this work we propose an enhanced wav2vec2.0 model. Specifically, the noisy speech and the corresponding clean version are fed into the same feature encoder, where the clean speech provides training targets for the model. Experimental results reveal that the proposed method can not only improve the ASR performance on the noisy test set which surpasses the original wav2vec2.0, but also ensure a tiny performance decrease on the clean test set. In addition, the effectiveness of the proposed method is demonstrated under different types of noise conditions.
引用
收藏
页码:3174 / 3178
页数:5
相关论文
共 50 条
  • [41] Phonetically Motivated Self-Supervised Speech Representation Learning
    Yue, Xianghu
    Li, Haizhou
    INTERSPEECH 2021, 2021, : 746 - 750
  • [42] Research on Mongolian Speech Recognition Based on the Self-supervised Model
    Su, Hongyi
    Xue, Yu
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 199 - 203
  • [43] Self-supervised Learning and Masked Language Model for Code-switching Automatic Speech Recognition
    Chen, Po-Kai
    Fu, Li-Yeh
    Chen, Cheng-Kai
    Lin, Yi-Xing
    Chen, Chih-Ping
    Huang, Chien-Lin
    Wang, Jia-Ching
    2024 IEEE TENTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS, ICCE 2024, 2024, : 387 - 391
  • [44] DialogueBERT: A Self-Supervised Learning based Dialogue Pre-training Encoder
    Zhang, Zhenyu
    Guo, Tao
    Chen, Meng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3647 - 3651
  • [45] Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
    Zhang, Wangyou
    Qian, Yanmin
    INTERSPEECH 2023, 2023, : 3517 - 3521
  • [46] SELF-TRAINING AND PRE-TRAINING ARE COMPLEMENTARY FOR SPEECH RECOGNITION
    Xu, Qiantong
    Baevski, Alexei
    Likhomanenko, Tatiana
    Tomasello, Paden
    Conneau, Alexis
    Collobert, Ronan
    Synnaeve, Gabriel
    Auli, Michael
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3030 - 3034
  • [47] Clustering and Retraining Based Self-Supervised Speech Representation Learning Method
    Zhang, Wenlin
    Liu, Xuepeng
    Niu, Tong
    Yang, Xukui
    Qu, Dan
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (05): : 461 - 471
  • [48] SENTIMENT-AWARE AUTOMATIC SPEECH RECOGNITION PRE-TRAINING FOR ENHANCED SPEECH EMOTION RECOGNITION
    Ghriss, Ayoub
    Yang, Bo
    Rozgic, Viktor
    Shriberg, Elizabeth
    Wang, Chao
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7347 - 7351
  • [49] Unsupervised modulation filter learning for noise-robust speech recognition
    Agrawal, Purvi
    Ganapathy, Sriram
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (03): : 1686 - 1692
  • [50] Self-supervised ECG pre-training
    Liu, Han
    Zhao, Zhenbo
    She, Qiang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70