HIFIDENOISE: HIGH-FIDELITY DENOISING TEXT TO SPEECH WITH ADVERSARIAL NETWORKS

被引:4
|
作者
Zhang, Lichao [1 ]
Ren, Yi [1 ]
Deng, Liqun [2 ]
Zhao, Zhou [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
[2] Huawei Noahs Ark Lab, Shenzhen, Guangdong, Peoples R China
关键词
text to speech; singing voice synthesis; noisy audio; denoise; generative adversarial network;
D O I
10.1109/ICASSP43922.2022.9747155
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Building a high-fidelity speech synthesis system with noisy speech data is a challenging but valuable task, which could significantly reduce the cost of data collection. Existing methods usually train speech synthesis systems based on the speech denoised with an enhancement model or feed noise information as a condition into the system. These methods certainly have some effect on inhibiting noise, but the quality and the prosody of their synthesized speech are still far away from natural speech. In this paper, we propose HiFiDenoise, a speech synthesis system with adversarial networks that can synthesize high-fidelity speech with low-quality and noisy speech data. Specifically, 1) to tackle the difficulty of noise modeling, we introduce multi-length adversarial training in the noise condition module. 2) To handle the problem of inaccurate pitch extraction caused by noise, we remove the pitch predictor in the acoustic model and also add discriminators on the mel-spectrogram generator. 3) In addition, we also apply HiFiDenoise to singing voice synthesis with a noisy singing dataset. Experiments show that our model outperforms the baseline by 0.36 and 0.44 in terms of MOS on speech and singing respectively.
引用
收藏
页码:7232 / 7236
页数:5
相关论文
共 50 条
  • [21] Roles of high-fidelity acoustic modeling in robust speech recognition
    Deng, Li
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 1 - 13
  • [22] MUSIC, SPEECH, HIGH-FIDELITY - STONG,WJ, PLITNICK,GR
    HAMMER, EL
    COMPUTER MUSIC JOURNAL, 1985, 9 (03) : 84 - 84
  • [23] Hybrid Quantum Networks for High-Fidelity Entanglement Distribution
    Lee, Yuan
    Bersin, Eric
    Dahlberg, Axel
    Wehner, Stephanie
    Englund, Dirk
    2020 CONFERENCE ON LASERS AND ELECTRO-OPTICS (CLEO), 2020,
  • [24] High-fidelity distributed simulation of Local Area Networks
    Ricciulli, L
    31ST ANNUAL SIMULATION SYMPOSIUM, PROCEEDINGS, 1998, : 165 - 172
  • [25] Development of a high-fidelity failure prediction system for reinforced concrete bridge columns using generative adversarial networks
    Wu, Ting-Yan
    Wu, Rih-Teng
    Wang, Ping-Hsiung
    Lin, Tzu-Kang
    Chang, Kuo-Chun
    ENGINEERING STRUCTURES, 2023, 286
  • [26] Routing Strategies for High-Fidelity, Multiplexed Quantum Networks
    Lee, Yuan
    Bersin, Eric
    Dai, Wenhan
    Englund, Dirk
    2021 CONFERENCE ON LASERS AND ELECTRO-OPTICS (CLEO), 2021,
  • [27] Make My Day - High-Fidelity Color Denoising with Near-Infrared
    Honda, Hiroto
    Timofte, Radu
    Van Gool, Luc
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2015,
  • [28] A high-fidelity speech and audio codec with low delay and low complexity
    Chen, JH
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1161 - 1164
  • [29] HIGH-FIDELITY HEADPHONES
    Anderson, L. J.
    JOURNAL OF THE SOCIETY OF MOTION PICTURE ENGINEERS, 1941, 37 (03): : 319 - 323
  • [30] BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus
    Meyer, Josh
    Adelani, David Ifeoluwa
    Casanova, Edresson
    Oktem, Alp
    Whitenack, Daniel
    Weber, Julian
    Kabongo, Salomon
    Salesky, Elizabeth
    Orife, Iroro
    Leong, Colin
    Ogayo, Perez
    Emezue, Chris
    Mukiibi, Jonathan
    Osei, Salomey
    Agbolo, Apelete
    Akinode, Victor
    Opoku, Bernard
    Olanrewaju, Samuel
    Alabi, Jesujoba
    Muhammad, Shamsuddeen
    INTERSPEECH 2022, 2022, : 2383 - 2387