Advancements in Expressive Speech Synthesis: a Review

被引:0
|
作者
Alwaisi, Shaimaa [1 ]
Nemeth, Geza [1 ]
机构
[1] Budapest Univ Technol & Econ, Fac Elect Engn & Informat, Dept Telecommun & Media Informat, Budapest, Hungary
来源
INFOCOMMUNICATIONS JOURNAL | 2024年 / 16卷 / 01期
关键词
Speech style; Expressivity; Emotional speech; Expressive TTS; Prosody modification; Multi- lingual and multi- speaker TTS; SPEAKER ADAPTATION; VOICE CONVERSION; TEXT; TTS; MODEL;
D O I
10.36244/ICJ.2024.1.5
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In recent years, we have witnessed a fast and wide spread acceptance of speech sinthesis technology in, leading to the transition toward a society characterized by a strong desire to incorporate these applications in their daily lives. We provide a comprehensive survey on the recent advancements in the field of expressive Text-To-Speech systems. Among different methods to represent expressivity, this paper facucesthe developmentofax pressive TTS systems, emphasizing the methodologies employed to enhance the quality and expressiveness of synthetic speech, such as style transfer and improving speaker variability. After that, we point out some of the subjective and objective metrics that are used to evaluate the quality of synthesized speech. Fi- nally, we point out the realm of child speech synthesis, a domain that has been neglected for some time. This underscores that the field of research in children's speech synthesis is still wide open for exploration and development. Overall, this paper presents a comprehensive overview of historical and contemporary trends and future directions in speech synthesis research.
引用
收藏
页码:35 / 46
页数:12
相关论文
共 50 条
  • [21] Expressive Prosody for Unit-selection Speech Synthesis
    Strom, Volker
    Clark, Robert
    King, Simon
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1296 - 1299
  • [22] JOINT AND ADVERSARIAL TRAINING WITH ASR FOR EXPRESSIVE SPEECH SYNTHESIS
    Zhang, Kaili
    Gong, Cheng
    Lu, Wenhuan
    Wang, Longbiao
    Wei, Jianguo
    Liu, Dawei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6322 - 6326
  • [23] SynPaFlex-Corpus: An Expressive French Audiobooks Corpus Dedicated to Expressive Speech Synthesis
    Sini, Aghilas
    Lolive, Damien
    Vidal, Gaelle
    Tahon, Marie
    Delais-Roussarie, Elisabeth
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4289 - 4296
  • [24] Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources
    Barakat, Huda
    Turk, Oytun
    Demiroglu, Cenk
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)
  • [25] Expressive Speech Synthesis: Past, Present, and Possible Futures
    Schroeder, Marc
    AFFECTIVE INFORMATION PROCESSING, 2009, : 111 - 126
  • [26] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
    Anil, Manjare Chandraprabha
    Shirbahadurkar, S. D.
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58
  • [27] Modeling the Acoustic Correlates of Expressive Elements in Text Genres for Expressive Text-to-Speech Synthesis
    Yang, Hongwu
    Meng, Helen M.
    Cai, Lianhong
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1806 - 1809
  • [28] What type of inputs will we need for expressive speech synthesis?
    Campbell, N
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 95 - 98
  • [29] Rigid head motion in expressive speech animation: Analysis and synthesis
    Busso, Carlos
    Deng, Zhigang
    Grimm, Michael
    Neumann, Ulrich
    Narayanan, Shrikanth
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 1075 - 1086
  • [30] Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis
    Brognaux, Sandrine
    Francois, Thomas
    Saerens, Marco
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3872 - 3879