HIFIDENOISE: HIGH-FIDELITY DENOISING TEXT TO SPEECH WITH ADVERSARIAL NETWORKS

被引:4
|
作者
Zhang, Lichao [1 ]
Ren, Yi [1 ]
Deng, Liqun [2 ]
Zhao, Zhou [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[2] Huawei Noahs Ark Lab, Shenzhen, Guangdong, Peoples R China
关键词
text to speech; singing voice synthesis; noisy audio; denoise; generative adversarial network;
D O I
10.1109/ICASSP43922.2022.9747155
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Building a high-fidelity speech synthesis system with noisy speech data is a challenging but valuable task, which could significantly reduce the cost of data collection. Existing methods usually train speech synthesis systems based on the speech denoised with an enhancement model or feed noise information as a condition into the system. These methods certainly have some effect on inhibiting noise, but the quality and the prosody of their synthesized speech are still far away from natural speech. In this paper, we propose HiFiDenoise, a speech synthesis system with adversarial networks that can synthesize high-fidelity speech with low-quality and noisy speech data. Specifically, 1) to tackle the difficulty of noise modeling, we introduce multi-length adversarial training in the noise condition module. 2) To handle the problem of inaccurate pitch extraction caused by noise, we remove the pitch predictor in the acoustic model and also add discriminators on the mel-spectrogram generator. 3) In addition, we also apply HiFiDenoise to singing voice synthesis with a noisy singing dataset. Experiments show that our model outperforms the baseline by 0.36 and 0.44 in terms of MOS on speech and singing respectively.
引用
收藏
页码:7232 / 7236
页数:5
相关论文
共 50 条
  • [41] Unambiguous and High-Fidelity Backdoor Watermarking for Deep Neural Networks
    Hua, Guang
    Teoh, Andrew Beng Jin
    Xiang, Yong
    Jiang, Hao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 11204 - 11217
  • [42] Social Networks Bringing a High-fidelity Simulator to Life on Facebook
    Tippin, Stephanie
    Arnold, Lorene
    NURSE EDUCATOR, 2012, 37 (04) : 148 - 149
  • [43] High-fidelity modelling and simulation of Myrinet system area networks
    George, A.D.
    VanLoon, R.A.
    International Journal of Modelling and Simulation, 2001, 21 (01): : 40 - 50
  • [44] Fast, High-fidelity Lyα Forests with Convolutional Neural Networks
    Harrington, Peter
    Mustafa, Mustafa
    Dornfest, Max
    Horowitz, Benjamin
    Lukic, Zarija
    ASTROPHYSICAL JOURNAL, 2022, 929 (02):
  • [45] A TWO-STAGE U-NET FOR HIGH-FIDELITY DENOISING OF HISTORICAL RECORDINGS
    Moliner, Eloi
    Valimaki, Vesa
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 841 - 845
  • [46] A SIMULATION TOOL FOR HIGH-FIDELITY MODELING OF COMPLEX LOGISTICAL NETWORKS
    Mathew, Reejo
    Mastaglio, Thomas W.
    Lewis, Andrew
    24TH EUROPEAN MODELING AND SIMULATION SYMPOSIUM (EMSS 2012), 2012, : 6 - 14
  • [47] HIGH-FIDELITY FACE SKETCH-TO-PHOTO SYNTHESIS USING GENERATIVE ADVERSARIAL NETWORK
    Chao, Wentao
    Chang, Liang
    Wang, Xuguang
    Cheng, Jian
    Deng, Xiaoming
    Duan, Fuqing
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 4699 - 4703
  • [48] A Fast Generative Adversarial Network for High-Fidelity Optical Coherence Tomography Image Synthesis
    Ge, Nan
    Liu, Yixi
    Xu, Xiang
    Zhang, Xuedian
    Jiang, Minshan
    PHOTONICS, 2022, 9 (12)
  • [49] Generating High-Fidelity Images with Disentangled Adversarial VAEs and Structure-Aware Loss
    Naderi, Habibeh
    Soleimani, Behrouz Haji
    Matwin, Stan
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [50] Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network
    Wang, Chunhui
    Zeng, Chang
    He, Xing
    INTERSPEECH 2023, 2023, : 5401 - 5405