HIFIDENOISE: HIGH-FIDELITY DENOISING TEXT TO SPEECH WITH ADVERSARIAL NETWORKS

被引:4
|
作者
Zhang, Lichao [1 ]
Ren, Yi [1 ]
Deng, Liqun [2 ]
Zhao, Zhou [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[2] Huawei Noahs Ark Lab, Shenzhen, Guangdong, Peoples R China
关键词
text to speech; singing voice synthesis; noisy audio; denoise; generative adversarial network;
D O I
10.1109/ICASSP43922.2022.9747155
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Building a high-fidelity speech synthesis system with noisy speech data is a challenging but valuable task, which could significantly reduce the cost of data collection. Existing methods usually train speech synthesis systems based on the speech denoised with an enhancement model or feed noise information as a condition into the system. These methods certainly have some effect on inhibiting noise, but the quality and the prosody of their synthesized speech are still far away from natural speech. In this paper, we propose HiFiDenoise, a speech synthesis system with adversarial networks that can synthesize high-fidelity speech with low-quality and noisy speech data. Specifically, 1) to tackle the difficulty of noise modeling, we introduce multi-length adversarial training in the noise condition module. 2) To handle the problem of inaccurate pitch extraction caused by noise, we remove the pitch predictor in the acoustic model and also add discriminators on the mel-spectrogram generator. 3) In addition, we also apply HiFiDenoise to singing voice synthesis with a noisy singing dataset. Experiments show that our model outperforms the baseline by 0.36 and 0.44 in terms of MOS on speech and singing respectively.
引用
收藏
页码:7232 / 7236
页数:5
相关论文
共 50 条
  • [1] HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
    Su, Jiaqi
    Jin, Zeyu
    Finkelstein, Adam
    INTERSPEECH 2020, 2020, : 4506 - 4510
  • [2] GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
    Yang, Jinhyeok
    Bae, Jae-Sung
    Bak, Taejun
    Kim, Young-Ik
    Cho, Hoon-Young
    INTERSPEECH 2021, 2021, : 2202 - 2206
  • [3] Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
    Kharitonov, Eugene
    Vincent, Damien
    Borsos, Zalan
    Marinier, Raphael
    Girgin, Sertan
    Pietquin, Olivier
    Sharifi, Matt
    Tagliasacchi, Marco
    Zeghidour, Neil
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1703 - 1718
  • [4] Generating Adversarial Driving Scenarios in High-Fidelity Simulators
    Abeysirigoonawardena, Yasasa
    Shkurti, Florian
    Dudek, Gregory
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 8271 - 8277
  • [5] Vowel formant discrimination for high-fidelity speech
    Liu, C
    Kewley-Port, D
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2004, 116 (02): : 1224 - 1233
  • [6] An Anechoic, High-Fidelity, Multidirectional Speech Corpus
    Miller, Margaret K.
    Delaram, Vahid
    Trine, Allison
    Ananthanarayana, Rohit M.
    Buss, Emily
    Monson, Brian B.
    Stecker, G. Christopher
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2025, 68 (01): : 411 - 418
  • [7] Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
    Ko, Myeongjin
    Kim, Euiyeon
    Choi, Yong-Hoon
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 577 - 587
  • [8] High fidelity zero shot speaker adaptation in text to speech synthesis with denoising diffusion GAN
    Liu, Xiangchun
    Ma, Xuan
    Song, Wei
    Zhang, Yanghao
    Zhang, Yi
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [9] Parallel and High-Fidelity Text-to-Lip Generation
    Liu, Jinglin
    Zhu, Zhiying
    Ren, Yi
    Huang, Wencan
    Huai, Baoxing
    Yuan, Nicholas
    Zhao, Zhou
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1738 - 1746
  • [10] Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis
    Lee, Sang-Hoon
    Yoon, Hyun-Wook
    Noh, Hyeong-Rae
    Kim, Ji-Hoon
    Lee, Seong-Whan
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13198 - 13206