Neural speech enhancement with unsupervised pre-training and mixture training

被引:10
|
作者
Hao, Xiang [1 ]
Xu, Chenglin [2 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Langauge Proc Grp, Xian, Peoples R China
[2] Kuaishou Technol, Beijing, Peoples R China
关键词
Speech enhancement; Neural network; Unsupervised pre -training; Mixture training; NOISE;
D O I
10.1016/j.neunet.2022.11.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised neural speech enhancement methods always require a large scale of paired noisy and clean speech data. Since collecting adequate paired data from real-world applications is infeasible, simulated data is always adopted in supervised learning methods. However, the mismatch between the simulated data and in-the-wild data always causes performance inconsistency when the system is deployed in real-world applications. Unsupervised speech enhancement methods are studied to address the mismatch problem by directly using the in-the-wild noisy data without access to the corresponding clean speech. Therefore, the simulated paired data is not necessary. However, the performance of the unsupervised speech enhancement method is not on par with the supervised learning method. To address the aforementioned problems, this work proposes an unsupervised pre-training and mixture training algorithm by leveraging the advantages of supervised and unsupervised learning methods. Specifically, the proposed speech enhancement approach employs large volumes of unpaired noisy and clean speech to conduct unsupervised pre-training. The noisy data and a small amount of simulated paired data are then used for mixture training to optimize the pre-trained model. Experimental results show that the proposed method achieves better performances than other state-of-the-art supervised and unsupervised learning methods.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页码:216 / 227
页数:12
相关论文
共 50 条
  • [31] Unsupervised Point Cloud Pre-training via Occlusion Completion
    Wang, Hanchen
    Liu, Qi
    Yue, Xiangyu
    Lasenby, Joan
    Kusner, Matt J.
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9762 - 9772
  • [32] UNSUPERVISED POINT CLOUD PRE-TRAINING VIA CONTRASTING AND CLUSTERING
    Mei, Guofeng
    Huang, Xiaoshui
    Liu, Juan
    Zhang, Jian
    Wu, Qiang
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 66 - 70
  • [33] Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech
    Lin, Jingru
    Ge, Meng
    Wang, Wupeng
    Li, Haizhou
    Feng, Mengling
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1014 - 1018
  • [34] A STUDY ON THE EFFICACY OF MODEL PRE-TRAINING IN DEVELOPING NEURAL TEXT-TO-SPEECH SYSTEM
    Zhang, Guangyan
    Leng, Yichong
    Tan, Daxin
    Qin, Ying
    Song, Kaitao
    Tan, Xu
    Zhao, Sheng
    Lee, Tan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6087 - 6091
  • [35] MALUP: A Malware Classification Framework using Convolutional Neural Network with Deep Unsupervised Pre-training
    Qiang, Qian
    Cheng, Mian
    Zhou, Yuan
    Ding, Yu
    Qi, Zisen
    2021 IEEE 20TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2021), 2021, : 627 - 634
  • [36] Unified Speech-Text Pre-training for Speech Translation and Recognition
    Tang, Yun
    Gong, Hongyu
    Dong, Ning
    Wang, Changhan
    Hsu, Wei-Ning
    Gu, Jiatao
    Baevski, Alexei
    Li, Xian
    Mohamed, Abdelrahman
    Auli, Michael
    Pino, Juan
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1488 - 1499
  • [37] Neural Graph Matching for Pre-training Graph Neural Networks
    Hou, Yupeng
    Hu, Binbin
    Zhao, Wayne Xin
    Zhang, Zhiqiang
    Zhou, Jun
    Wen, Ji-Rong
    PROCEEDINGS OF THE 2022 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2022, : 172 - 180
  • [38] Pre-training Techniques for Improving Text-to-Speech Synthesis by Automatic Speech Recognition Based Data Enhancement
    Liu, Yazhu
    Xue, Shaofei
    Tang, Jian
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 162 - 172
  • [39] Multilingual Denoising Pre-training for Neural Machine Translation
    Liu, Yinhan
    Gu, Jiatao
    Goyal, Naman
    Li, Xian
    Edunov, Sergey
    Ghazvininejad, Marjan
    Lewis, Mike
    Zettlemoyer, Luke
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 726 - 742
  • [40] Curriculum pre-training for stylized neural machine translation
    Zou, Aixiao
    Wu, Xuanxuan
    Li, Xinjie
    Zhang, Ting
    Cui, Fuwei
    Xu, Jinan
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 7958 - 7968