Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis

被引:3
|
作者
Wang, Xin [1 ,2 ]
Takaki, Shinji [1 ]
Yamagishi, Junichi [1 ,2 ,3 ]
机构
[1] Natl Inst Informat, Tokyo 1018430, Japan
[2] SOKENDAI, Tokyo 1018430, Japan
[3] Univ Edinburgh, CSTR, Edinburgh EH8 9LW, Midlothian, Scotland
来源
基金
英国工程与自然科学研究理事会;
关键词
text-to-speech; speech synthesis; recurrent neural network; contexts; word embedding;
D O I
10.1587/transinf.2016SLP0011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Building high-quality text-to-speech (TTS) systems without expert knowledge of the target language and/or time-consuming manual annotation of speech and text data is an important yet challenging research topic. In this kind of TTS system, it is vital to find representation of the input text that is both effective and easy to acquire. Recently, the continuous representation of raw word inputs, called "word embedding", has been successfully used in various natural language processing tasks. It has also been used as the additional or alternative linguistic input features to a neural-network-based acoustic model for TTS systems. In this paper, we further investigate the use of this embedding technique to represent phonemes, syllables and phrases for the acoustic model based on the recurrent and feed-forward neural network. Results of the experiments show that most of these continuous representations cannot significantly improve the system's performance when they are fed into the acoustic model either as additional component or as a replacement of the conventional prosodic context. However, subjective evaluation shows that the continuous representation of phrases can achieve significant improvement when it is combined with the prosodic context as input to the acoustic model based on the feed-forward neural network.
引用
收藏
页码:2471 / 2480
页数:10
相关论文
共 50 条
  • [31] Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks
    Valentini-Botinhao, Cassia
    Wang, Xin
    Takaki, Shinji
    Yamagishi, Junichi
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 352 - 356
  • [32] Low-level articulatory synthesis: A working text-to-speech solution and a linguistic tool
    Hill, David R.
    Taube-Schock, Craig R.
    Manzara, Leonard
    CANADIAN JOURNAL OF LINGUISTICS-REVUE CANADIENNE DE LINGUISTIQUE, 2017, 62 (03): : 371 - 410
  • [33] Gemination prediction using DNN for Arabic text-to-speech synthesis
    Ali, Ikbel Hadj
    Mnasri, Zied
    Laachri, Zied
    2019 16TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2019, : 366 - 370
  • [34] Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration
    Yeshpanov, Rustem
    Mussakhojayeva, Saida
    Khassanov, Yerbolat
    arXiv, 2023,
  • [35] Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration
    Yeshpanov, Rustem
    Mussakhojayeva, Saida
    Khassanov, Yerbolat
    INTERSPEECH 2023, 2023, : 5521 - 5525
  • [36] Lombard Speech Synthesis using Transfer Learning in a Tacotron Text-to-Speech System
    Bollepalli, Bajibabu
    Juvela, Lauri
    Alku, Paavo
    INTERSPEECH 2019, 2019, : 2833 - 2837
  • [37] Corpus-based Malay Text-to-Speech Synthesis System
    Swee, Tan Tian
    Salleh, Sheikh Hussain Shaikh
    2008 14TH ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS, (APCC), VOLS 1 AND 2, 2008, : 52 - 56
  • [38] Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion
    Paul, Dipjyoti
    Shifas, Muhammed P., V
    Pantazis, Yannis
    Stylianou, Yannis
    INTERSPEECH 2020, 2020, : 1361 - 1365
  • [39] A RULE BASED PROSODY MODEL FOR TURKISH TEXT-TO-SPEECH SYNTHESIS
    Uslu, Ibrahim Baran
    Ilk, Hakki Gokhan
    Yilmaz, Asim Egemen
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2013, 20 (02): : 217 - 223
  • [40] [Invited] Generative Model-Based Text-to-Speech Synthesis
    Zen, Heiga
    2018 IEEE 7TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE 2018), 2018, : 327 - 328