Neural speech enhancement with unsupervised pre-training and mixture training

被引:10
|
作者
Hao, Xiang [1 ]
Xu, Chenglin [2 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Langauge Proc Grp, Xian, Peoples R China
[2] Kuaishou Technol, Beijing, Peoples R China
关键词
Speech enhancement; Neural network; Unsupervised pre -training; Mixture training; NOISE;
D O I
10.1016/j.neunet.2022.11.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised neural speech enhancement methods always require a large scale of paired noisy and clean speech data. Since collecting adequate paired data from real-world applications is infeasible, simulated data is always adopted in supervised learning methods. However, the mismatch between the simulated data and in-the-wild data always causes performance inconsistency when the system is deployed in real-world applications. Unsupervised speech enhancement methods are studied to address the mismatch problem by directly using the in-the-wild noisy data without access to the corresponding clean speech. Therefore, the simulated paired data is not necessary. However, the performance of the unsupervised speech enhancement method is not on par with the supervised learning method. To address the aforementioned problems, this work proposes an unsupervised pre-training and mixture training algorithm by leveraging the advantages of supervised and unsupervised learning methods. Specifically, the proposed speech enhancement approach employs large volumes of unpaired noisy and clean speech to conduct unsupervised pre-training. The noisy data and a small amount of simulated paired data are then used for mixture training to optimize the pre-trained model. Experimental results show that the proposed method achieves better performances than other state-of-the-art supervised and unsupervised learning methods.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页码:216 / 227
页数:12
相关论文
共 50 条
  • [41] PHGNN: Pre-Training Heterogeneous Graph Neural Networks
    Li, Xin
    Wei, Hao
    Ding, Yu
    IEEE ACCESS, 2024, 12 : 135411 - 135418
  • [42] Synthetic Pre-Training Tasks for Neural Machine Translation
    He, Zexue
    Blackwood, Graeme
    Panda, Rameswar
    McAuley, Julian
    Feris, Rogerio
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8080 - 8098
  • [43] On the Copying Behaviors of Pre-Training for Neural Machine Translation
    Liu, Xuebo
    Wang, Longyue
    Wong, Derek F.
    Ding, Liang
    Chao, Lidia S.
    Shi, Shuming
    Tu, Zhaopeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4265 - 4275
  • [44] Graph Neural Pre-training for Recommendation with Side Information
    Liu, Siwei
    Meng, Zaiqiao
    Macdonald, Craig
    Ounis, Iadh
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2023, 41 (03)
  • [45] LATTICEBART: LATTICE-TO-LATTICE PRE-TRAINING FOR SPEECH RECOGNITION
    Dai, Lingfeng
    Chen, Lu
    Zhou, Zhikai
    Yu, Kai
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6112 - 6116
  • [46] Impact of Including Pathological Speech in Pre-training on Pathology Detection
    Weisel, Tobias
    Maier, Andreas
    Demir, Kubilay Can
    Perez-Toro, Paula Andrea
    Arias-Vergara, Tomas
    Heismann, Bjoern
    Noeth, Elmar
    Schuster, Maria
    Yang, Seung Hee
    TEXT, SPEECH, AND DIALOGUE, TSD 2023, 2023, 14102 : 141 - 153
  • [47] MULTI-MODAL PRE-TRAINING FOR AUTOMATED SPEECH RECOGNITION
    Chan, David M.
    Ghosh, Shalini
    Chakrabarty, Debmalya
    Hoffmeister, Bjorn
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 246 - 250
  • [48] SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data
    Zhang, Ziqiang
    Chen, Sanyuan
    Zhou, Long
    Wu, Yu
    Ren, Shuo
    Liu, Shujie
    Yao, Zhuoyuan
    Gong, Xun
    Dai, Lirong
    Li, Jinyu
    Wei, Furu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2177 - 2187
  • [49] A comparison of supervised and unsupervised pre-training of end-to-end models
    Misra, Ananya
    Hwang, Dongseong
    Huo, Zhouyuan
    Garg, Shefali
    Siddhartha, Nikhil
    Narayanan, Arun
    Sim, Khe Chai
    INTERSPEECH 2021, 2021, : 731 - 735
  • [50] Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training
    Lei, Yibin
    Ding, Liang
    Cao, Yu
    Zan, Chantong
    Yates, Andrew
    Tao, Dacheng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 10932 - 10940