Neural speech enhancement with unsupervised pre-training and mixture training

被引:10
|
作者
Hao, Xiang [1 ]
Xu, Chenglin [2 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Langauge Proc Grp, Xian, Peoples R China
[2] Kuaishou Technol, Beijing, Peoples R China
关键词
Speech enhancement; Neural network; Unsupervised pre -training; Mixture training; NOISE;
D O I
10.1016/j.neunet.2022.11.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised neural speech enhancement methods always require a large scale of paired noisy and clean speech data. Since collecting adequate paired data from real-world applications is infeasible, simulated data is always adopted in supervised learning methods. However, the mismatch between the simulated data and in-the-wild data always causes performance inconsistency when the system is deployed in real-world applications. Unsupervised speech enhancement methods are studied to address the mismatch problem by directly using the in-the-wild noisy data without access to the corresponding clean speech. Therefore, the simulated paired data is not necessary. However, the performance of the unsupervised speech enhancement method is not on par with the supervised learning method. To address the aforementioned problems, this work proposes an unsupervised pre-training and mixture training algorithm by leveraging the advantages of supervised and unsupervised learning methods. Specifically, the proposed speech enhancement approach employs large volumes of unpaired noisy and clean speech to conduct unsupervised pre-training. The noisy data and a small amount of simulated paired data are then used for mixture training to optimize the pre-trained model. Experimental results show that the proposed method achieves better performances than other state-of-the-art supervised and unsupervised learning methods.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页码:216 / 227
页数:12
相关论文
共 50 条
  • [11] SELF-TRAINING AND PRE-TRAINING ARE COMPLEMENTARY FOR SPEECH RECOGNITION
    Xu, Qiantong
    Baevski, Alexei
    Likhomanenko, Tatiana
    Tomasello, Paden
    Conneau, Alexis
    Collobert, Ronan
    Synnaeve, Gabriel
    Auli, Michael
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3030 - 3034
  • [12] Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data
    Grundkiewicz, Roman
    Junczys-Dowmunt, Marcin
    Heafield, Kenneth
    INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS, 2019, : 252 - 263
  • [13] Unsupervised Pre-Training with Spiking Neural Networks in Semi-Supervised Learning
    Dorogyy, Yaroslav
    Kolisnichenko, Vadym
    2018 IEEE FIRST INTERNATIONAL CONFERENCE ON SYSTEM ANALYSIS & INTELLIGENT COMPUTING (SAIC), 2018, : 177 - 180
  • [14] Large-Scale Unsupervised Audio Pre-Training for Video-to-Speech Synthesis
    Kefalas, Triantafyllos
    Panagakis, Yannis
    Pantic, Maja
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2255 - 2268
  • [15] PERFORMANCE-EFFICIENCY TRADE-OFFS IN UNSUPERVISED PRE-TRAINING FOR SPEECH RECOGNITION
    Wu, Felix
    Kim, Kwangyoun
    Pan, Jing
    Han, Kyu J.
    Weinberger, Kilian Q.
    Artzi, Yoav
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7667 - 7671
  • [16] Exploring unsupervised pre-training for echo state networks
    Peter Steiner
    Azarakhsh Jalalvand
    Peter Birkholz
    Neural Computing and Applications, 2023, 35 : 24225 - 24242
  • [17] Unsupervised Pre-training for Person Re-identification
    Fu, Dengpan
    Chen, Dongdong
    Bao, Jianmin
    Yang, Hao
    Yuan, Lu
    Zhang, Lei
    Li, Houqiang
    Chen, Dong
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14745 - 14754
  • [18] Behavior From the Void: Unsupervised Active Pre-Training
    Liu, Hao
    Abbeel, Pieter
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [19] FlauBERT: Unsupervised Language Model Pre-training for French
    Le, Hang
    Vial, Loic
    Frej, Jibril
    Segonne, Vincent
    Coavoux, Maximin
    Lecouteux, Benjamin
    Allauzen, Alexandre
    Crabbe, Benoit
    Besacier, Laurent
    Schwab, Didier
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2479 - 2490
  • [20] Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers
    Xu, Shusheng
    Zhang, Xingxing
    Wu, Yi
    Wei, Furu
    Zhou, Ming
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1784 - 1795