Advancements in Expressive Speech Synthesis: a Review

被引:0
|
作者
Alwaisi, Shaimaa [1 ]
Nemeth, Geza [1 ]
机构
[1] Budapest Univ Technol & Econ, Fac Elect Engn & Informat, Dept Telecommun & Media Informat, Budapest, Hungary
来源
INFOCOMMUNICATIONS JOURNAL | 2024年 / 16卷 / 01期
关键词
Speech style; Expressivity; Emotional speech; Expressive TTS; Prosody modification; Multi- lingual and multi- speaker TTS; SPEAKER ADAPTATION; VOICE CONVERSION; TEXT; TTS; MODEL;
D O I
10.36244/ICJ.2024.1.5
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In recent years, we have witnessed a fast and wide spread acceptance of speech sinthesis technology in, leading to the transition toward a society characterized by a strong desire to incorporate these applications in their daily lives. We provide a comprehensive survey on the recent advancements in the field of expressive Text-To-Speech systems. Among different methods to represent expressivity, this paper facucesthe developmentofax pressive TTS systems, emphasizing the methodologies employed to enhance the quality and expressiveness of synthetic speech, such as style transfer and improving speaker variability. After that, we point out some of the subjective and objective metrics that are used to evaluate the quality of synthesized speech. Fi- nally, we point out the realm of child speech synthesis, a domain that has been neglected for some time. This underscores that the field of research in children's speech synthesis is still wide open for exploration and development. Overall, this paper presents a comprehensive overview of historical and contemporary trends and future directions in speech synthesis research.
引用
收藏
页码:35 / 46
页数:12
相关论文
共 50 条
  • [41] A Comparison of Expressive Speech Synthesis Approaches based on Neural Network
    Xue, Liumeng
    Zhu, Xiaolian
    An, Xiaochun
    Xie, Lei
    PROCEEDINGS OF THE JOINT WORKSHOP OF THE 4TH WORKSHOP ON AFFECTIVE SOCIAL MULTIMEDIA COMPUTING AND FIRST MULTI-MODAL AFFECTIVE COMPUTING OF LARGE-SCALE MULTIMEDIA DATA (ASMMC-MMAC'18), 2018, : 15 - 20
  • [42] Evaluating expressive speech synthesis from audiobooks in conversational phrases
    Szekely, Eva
    Cabral, Joao P.
    Abou-Zleikha, Mohamed
    Cahill, Peter
    Carson-Berndsen, Julie
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3335 - 3339
  • [43] A framework towards expressive speech analysis and synthesis with preliminary results
    Spyros Raptis
    Sotiris Karabetsos
    Aimilios Chalamandaris
    Pirros Tsiakoulis
    Journal on Multimodal User Interfaces, 2015, 9 : 387 - 394
  • [44] Expressive Speech Synthesis for Urgent Warning Messages Generation in Romani and Slovak
    Rusko, Milan
    Trnka, Marian
    Darjaa, Sakhia
    Ritomsky, Marian
    TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 257 - 264
  • [45] Expressive speech synthesis in MARY TTS using audiobook data and EmotionML
    Charfuelan, Marcela
    Steiner, Ingmar
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1563 - 1567
  • [46] ChildTinyTalks (CTT): A Benchmark Dataset and Baseline for Expressive Child Speech Synthesis
    Alwaisi, Shaimaa
    Al-Radhi, Mohammed Salah
    Nemeth, Geza
    SPEECH AND COMPUTER, SPECOM 2024, PT I, 2025, 15299 : 230 - 240
  • [47] Towards Multi-Scale Style Control for Expressive Speech Synthesis
    Li, Xiang
    Song, Changhe
    Li, Jingbei
    Wu, Zhiyong
    Jia, Jia
    Meng, Helen
    INTERSPEECH 2021, 2021, : 4673 - 4677
  • [48] The IBM expressive text-to-speech synthesis system for American English
    Pitrelli, John F.
    Bakis, Raitno
    Eide, Ellen M.
    Fernandez, Raul
    Hamza, Wael
    Picheny, Michael A.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04): : 1099 - 1108
  • [49] Expressive facial animation synthesis by learning speech coarticulation and expression spaces
    Deng, Zhigang
    Neumann, Ulrich
    Lewis, J. P.
    Kim, Tae-Yong
    Bulut, Murtaza
    Narayanan, Shrikanth
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2006, 12 (06) : 1523 - 1534
  • [50] Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques
    Turk, Oytun
    Schroeder, Marc
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 965 - 973