Advancements in Expressive Speech Synthesis: a Review

被引：0

作者：

Alwaisi, Shaimaa ^{[1
]}

Nemeth, Geza ^{[1
]}

机构：

[1] Budapest Univ Technol & Econ, Fac Elect Engn & Informat, Dept Telecommun & Media Informat, Budapest, Hungary

来源：

INFOCOMMUNICATIONS JOURNAL | 2024年 / 16卷 / 01期

关键词：

Speech style; Expressivity; Emotional speech; Expressive TTS; Prosody modification; Multi- lingual and multi- speaker TTS; SPEAKER ADAPTATION; VOICE CONVERSION; TEXT; TTS; MODEL;

D O I：

10.36244/ICJ.2024.1.5

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

In recent years, we have witnessed a fast and wide spread acceptance of speech sinthesis technology in, leading to the transition toward a society characterized by a strong desire to incorporate these applications in their daily lives. We provide a comprehensive survey on the recent advancements in the field of expressive Text-To-Speech systems. Among different methods to represent expressivity, this paper facucesthe developmentofax pressive TTS systems, emphasizing the methodologies employed to enhance the quality and expressiveness of synthetic speech, such as style transfer and improving speaker variability. After that, we point out some of the subjective and objective metrics that are used to evaluate the quality of synthesized speech. Fi- nally, we point out the realm of child speech synthesis, a domain that has been neglected for some time. This underscores that the field of research in children's speech synthesis is still wide open for exploration and development. Overall, this paper presents a comprehensive overview of historical and contemporary trends and future directions in speech synthesis research.

引用

页码：35 / 46

页数：12

共 50 条

[41] A Comparison of Expressive Speech Synthesis Approaches based on Neural Network
Xue, Liumeng
Zhu, Xiaolian
An, Xiaochun
Xie, Lei
PROCEEDINGS OF THE JOINT WORKSHOP OF THE 4TH WORKSHOP ON AFFECTIVE SOCIAL MULTIMEDIA COMPUTING AND FIRST MULTI-MODAL AFFECTIVE COMPUTING OF LARGE-SCALE MULTIMEDIA DATA (ASMMC-MMAC'18), 2018, : 15 - 20
[42] Evaluating expressive speech synthesis from audiobooks in conversational phrases
Szekely, Eva
Cabral, Joao P.
Abou-Zleikha, Mohamed
Cahill, Peter
Carson-Berndsen, Julie
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3335 - 3339
[43] A framework towards expressive speech analysis and synthesis with preliminary results
Spyros Raptis
Sotiris Karabetsos
Aimilios Chalamandaris
Pirros Tsiakoulis
Journal on Multimodal User Interfaces, 2015, 9 : 387 - 394
[44] Expressive Speech Synthesis for Urgent Warning Messages Generation in Romani and Slovak
Rusko, Milan
Trnka, Marian
Darjaa, Sakhia
Ritomsky, Marian
TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 257 - 264
[45] Expressive speech synthesis in MARY TTS using audiobook data and EmotionML
Charfuelan, Marcela
Steiner, Ingmar
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1563 - 1567
[46] ChildTinyTalks (CTT): A Benchmark Dataset and Baseline for Expressive Child Speech Synthesis
Alwaisi, Shaimaa
Al-Radhi, Mohammed Salah
Nemeth, Geza
SPEECH AND COMPUTER, SPECOM 2024, PT I, 2025, 15299 : 230 - 240
[47] Towards Multi-Scale Style Control for Expressive Speech Synthesis
Li, Xiang
Song, Changhe
Li, Jingbei
Wu, Zhiyong
Jia, Jia
Meng, Helen
INTERSPEECH 2021, 2021, : 4673 - 4677
[48] The IBM expressive text-to-speech synthesis system for American English
Pitrelli, John F.
Bakis, Raitno
Eide, Ellen M.
Fernandez, Raul
Hamza, Wael
Picheny, Michael A.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04): : 1099 - 1108
[49] Expressive facial animation synthesis by learning speech coarticulation and expression spaces
Deng, Zhigang
Neumann, Ulrich
Lewis, J. P.
Kim, Tae-Yong
Bulut, Murtaza
Narayanan, Shrikanth
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2006, 12 (06) : 1523 - 1534
[50] Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques
Turk, Oytun
Schroeder, Marc
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05): : 965 - 973

← 1 2 3 4 5 →