Neural speech enhancement with unsupervised pre-training and mixture training

被引:10
|
作者
Hao, Xiang [1 ]
Xu, Chenglin [2 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Langauge Proc Grp, Xian, Peoples R China
[2] Kuaishou Technol, Beijing, Peoples R China
关键词
Speech enhancement; Neural network; Unsupervised pre -training; Mixture training; NOISE;
D O I
10.1016/j.neunet.2022.11.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised neural speech enhancement methods always require a large scale of paired noisy and clean speech data. Since collecting adequate paired data from real-world applications is infeasible, simulated data is always adopted in supervised learning methods. However, the mismatch between the simulated data and in-the-wild data always causes performance inconsistency when the system is deployed in real-world applications. Unsupervised speech enhancement methods are studied to address the mismatch problem by directly using the in-the-wild noisy data without access to the corresponding clean speech. Therefore, the simulated paired data is not necessary. However, the performance of the unsupervised speech enhancement method is not on par with the supervised learning method. To address the aforementioned problems, this work proposes an unsupervised pre-training and mixture training algorithm by leveraging the advantages of supervised and unsupervised learning methods. Specifically, the proposed speech enhancement approach employs large volumes of unpaired noisy and clean speech to conduct unsupervised pre-training. The noisy data and a small amount of simulated paired data are then used for mixture training to optimize the pre-trained model. Experimental results show that the proposed method achieves better performances than other state-of-the-art supervised and unsupervised learning methods.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页码:216 / 227
页数:12
相关论文
共 50 条
  • [21] Unsupervised Pre-training for Temporal Action Localization Tasks
    Zhang, Can
    Yang, Tianyu
    Weng, Junwu
    Cao, Meng
    Wang, Jue
    Zou, Yuexian
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14011 - 14021
  • [22] Exploring unsupervised pre-training for echo state networks
    Steiner, Peter
    Jalalvand, Azarakhsh
    Birkholz, Peter
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (34): : 24225 - 24242
  • [23] Pre-training on dynamic graph neural networks
    Chen, Ke-Jia
    Zhang, Jiajun
    Jiang, Linpu
    Wang, Yunyun
    Dai, Yuxuan
    NEUROCOMPUTING, 2022, 500 : 679 - 687
  • [24] Pre-training Methods for Neural Machine Translation
    Wang, Mingxuan
    Li, Lei
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: TUTORIAL ABSTRACTS, 2021, : 21 - 25
  • [25] Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
    Reddy, Arun
    Paul, William
    Rivera, Corban
    Shah, Ketul
    de Melo, Celso M.
    Chellappa, Rama
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 18919 - 18929
  • [26] An Empirical Study on Unsupervised Pre-training Approaches in Regression Problems
    Saikia, Pallabi
    Baruah, Rashmi Dutta
    2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 342 - 349
  • [27] GENERATIVE PRE-TRAINING FOR SPEECH WITH AUTOREGRESSIVE PREDICTIVE CODING
    Chung, Yu-An
    Glass, James
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3497 - 3501
  • [28] Unsupervised pre-training of graph transformers on patient population graphs
    Pellegrini, Chantal
    Navab, Nassir
    Kazi, Anees
    MEDICAL IMAGE ANALYSIS, 2023, 89
  • [29] TRANSFORMER BASED UNSUPERVISED PRE-TRAINING FOR ACOUSTIC REPRESENTATION LEARNING
    Zhang, Ruixiong
    Wu, Haiwei
    Li, Wubo
    Jiang, Dongwei
    Zou, Wei
    Li, Xiangang
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6933 - 6937
  • [30] Why Does Unsupervised Pre-training Help Deep Learning?
    Erhan, Dumitru
    Bengio, Yoshua
    Courville, Aaron
    Manzagol, Pierre-Antoine
    Vincent, Pascal
    Bengio, Samy
    JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 625 - 660