MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS

被引：0

作者：

Rosenberg, Andrew ^{[1
]}

Fernandez, Raul ^{[1
]}

Ramabhadran, Bhuvana ^{[1
]}

机构：

[1] IBM Res AI, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

关键词：

prosody prediction; speech synthesis; low resources;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The generation of natural and expressive prosodic contours is an important component of a text-to-speech (TTS) system which, in most classical architectures, relies on the existence of a text-analysis processor that can extract prosody-predictive features and pass them to a statistical learning model. These features can range from basic properties of the input string to rich high-level features which may not be always available when developing a TTS system in a new language with sparse computational resources. In this work we investigate how the prosody model of a speech-synthesis system performs as a function of different predictive feature sets that assume access to a certain amount of rich resources. We investigate, using objective metrics, the effect of relaxing the assumptions on input representations for prosody prediction for 5 languages, and evaluate the perceptual implications for US English.

引用

页码：5114 / 5118

页数：5

共 50 条

[1] Prosody analysis and modeling for emotional speech synthesis
Jiang, DN
Zhang, W
Shen, LQ
Cai, LH
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 281 - 284
[2] HIERARCHICAL PROSODY MODELING FOR NON-AUTOREGRESSIVE SPEECH SYNTHESIS
Chien, Chung-Ming
Lee, Hung-yi
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 446 - 453
[3] Diction based prosody modeling in table-to-speech synthesis
Spiliotopoulos, D
Xydas, G
Kouroupetroglou, G
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 294 - 301
[4] ACCENT GROUP MODELING FOR IMPROVED PROSODY IN STATISTICAL PARAMETERIC SPEECH SYNTHESIS
Anumanchipalli, Gopala Krishna
Oliveira, Luis C.
Black, Alan W.
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6890 - 6894
[5] PROSODY MODELING FOR MANDARIN EXCLAMATORY SPEECH
Jia, Huibin
Tao, Jianhua
ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 890 - 893
[6] Fluent speech prosody: Framework and modeling
Tseng, CY
Pin, SH
Lee, Y
Wang, HM
Chen, YC
SPEECH COMMUNICATION, 2005, 46 (3-4) : 284 - 309
[7] Fluent speech prosody: Framework and modeling
Tseng, Chiu-Yu
Pin, Shao-Huang
Lee, Yehlin
Wang, Hsin-Min
Chen, Yong-Cheng
Speech Commun, 3-4 (284-309):
[8] Combining linguistic with statistical methods in modeling prosody
Price, P
Ostendorf, M
SIGNAL TO SYNTAX: BOOTSTRAPPING FROM SPEECH TO GRAMMAR IN EARLY ACQUISITION, 1996, : 67 - 83
[9] Modeling the effect of linguistic predictability on speech intelligibility prediction
Edraki, Amin
Chan, Wai-Yip
Fogerty, Daniel
Jensen, Jesper
JASA EXPRESS LETTERS, 2023, 3 (03):
[10] Affective and linguistic processing of speech prosody: DC potential studies
Pihan, Hans
UNDERSTANDING EMOTIONS, 2006, 156 : 269 - 284

← 1 2 3 4 5 →