Neural speech enhancement with unsupervised pre-training and mixture training

被引：10

作者：

Hao, Xiang ^{[1
]}

Xu, Chenglin ^{[2
]}

Xie, Lei ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Langauge Proc Grp, Xian, Peoples R China

[2] Kuaishou Technol, Beijing, Peoples R China

来源：

NEURAL NETWORKS | 2023年 / 158卷

关键词：

Speech enhancement; Neural network; Unsupervised pre -training; Mixture training; NOISE;

D O I：

10.1016/j.neunet.2022.11.013

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Supervised neural speech enhancement methods always require a large scale of paired noisy and clean speech data. Since collecting adequate paired data from real-world applications is infeasible, simulated data is always adopted in supervised learning methods. However, the mismatch between the simulated data and in-the-wild data always causes performance inconsistency when the system is deployed in real-world applications. Unsupervised speech enhancement methods are studied to address the mismatch problem by directly using the in-the-wild noisy data without access to the corresponding clean speech. Therefore, the simulated paired data is not necessary. However, the performance of the unsupervised speech enhancement method is not on par with the supervised learning method. To address the aforementioned problems, this work proposes an unsupervised pre-training and mixture training algorithm by leveraging the advantages of supervised and unsupervised learning methods. Specifically, the proposed speech enhancement approach employs large volumes of unpaired noisy and clean speech to conduct unsupervised pre-training. The noisy data and a small amount of simulated paired data are then used for mixture training to optimize the pre-trained model. Experimental results show that the proposed method achieves better performances than other state-of-the-art supervised and unsupervised learning methods.(c) 2022 Elsevier Ltd. All rights reserved.

引用

页码：216 / 227

页数：12

共 50 条

[21] Unsupervised Pre-training for Temporal Action Localization Tasks
Zhang, Can
Yang, Tianyu
Weng, Junwu
Cao, Meng
Wang, Jue
Zou, Yuexian
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14011 - 14021
[22] Exploring unsupervised pre-training for echo state networks
Steiner, Peter
Jalalvand, Azarakhsh
Birkholz, Peter
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (34): : 24225 - 24242
[23] Pre-training on dynamic graph neural networks
Chen, Ke-Jia
Zhang, Jiajun
Jiang, Linpu
Wang, Yunyun
Dai, Yuxuan
NEUROCOMPUTING, 2022, 500 : 679 - 687
[24] Pre-training Methods for Neural Machine Translation
Wang, Mingxuan
Li, Lei
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: TUTORIAL ABSTRACTS, 2021, : 21 - 25
[25] Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Reddy, Arun
Paul, William
Rivera, Corban
Shah, Ketul
de Melo, Celso M.
Chellappa, Rama
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 18919 - 18929
[26] An Empirical Study on Unsupervised Pre-training Approaches in Regression Problems
Saikia, Pallabi
Baruah, Rashmi Dutta
2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 342 - 349
[27] GENERATIVE PRE-TRAINING FOR SPEECH WITH AUTOREGRESSIVE PREDICTIVE CODING
Chung, Yu-An
Glass, James
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3497 - 3501
[28] Unsupervised pre-training of graph transformers on patient population graphs
Pellegrini, Chantal
Navab, Nassir
Kazi, Anees
MEDICAL IMAGE ANALYSIS, 2023, 89
[29] TRANSFORMER BASED UNSUPERVISED PRE-TRAINING FOR ACOUSTIC REPRESENTATION LEARNING
Zhang, Ruixiong
Wu, Haiwei
Li, Wubo
Jiang, Dongwei
Zou, Wei
Li, Xiangang
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6933 - 6937
[30] Why Does Unsupervised Pre-training Help Deep Learning?
Erhan, Dumitru
Bengio, Yoshua
Courville, Aaron
Manzagol, Pierre-Antoine
Vincent, Pascal
Bengio, Samy
JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 625 - 660

← 1 2 3 4 5 →