MEASURING THE EFFECT OF LINGUISTIC RESOURCES ON PROSODY MODELING FOR SPEECH SYNTHESIS

被引:0
|
作者
Rosenberg, Andrew [1 ]
Fernandez, Raul [1 ]
Ramabhadran, Bhuvana [1 ]
机构
[1] IBM Res AI, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
prosody prediction; speech synthesis; low resources;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The generation of natural and expressive prosodic contours is an important component of a text-to-speech (TTS) system which, in most classical architectures, relies on the existence of a text-analysis processor that can extract prosody-predictive features and pass them to a statistical learning model. These features can range from basic properties of the input string to rich high-level features which may not be always available when developing a TTS system in a new language with sparse computational resources. In this work we investigate how the prosody model of a speech-synthesis system performs as a function of different predictive feature sets that assume access to a certain amount of rich resources. We investigate, using objective metrics, the effect of relaxing the assumptions on input representations for prosody prediction for 5 languages, and evaluate the perceptual implications for US English.
引用
收藏
页码:5114 / 5118
页数:5
相关论文
共 50 条
  • [1] Prosody analysis and modeling for emotional speech synthesis
    Jiang, DN
    Zhang, W
    Shen, LQ
    Cai, LH
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 281 - 284
  • [2] HIERARCHICAL PROSODY MODELING FOR NON-AUTOREGRESSIVE SPEECH SYNTHESIS
    Chien, Chung-Ming
    Lee, Hung-yi
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 446 - 453
  • [3] Diction based prosody modeling in table-to-speech synthesis
    Spiliotopoulos, D
    Xydas, G
    Kouroupetroglou, G
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 294 - 301
  • [4] ACCENT GROUP MODELING FOR IMPROVED PROSODY IN STATISTICAL PARAMETERIC SPEECH SYNTHESIS
    Anumanchipalli, Gopala Krishna
    Oliveira, Luis C.
    Black, Alan W.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6890 - 6894
  • [5] PROSODY MODELING FOR MANDARIN EXCLAMATORY SPEECH
    Jia, Huibin
    Tao, Jianhua
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 890 - 893
  • [6] Fluent speech prosody: Framework and modeling
    Tseng, CY
    Pin, SH
    Lee, Y
    Wang, HM
    Chen, YC
    SPEECH COMMUNICATION, 2005, 46 (3-4) : 284 - 309
  • [7] Fluent speech prosody: Framework and modeling
    Tseng, Chiu-Yu
    Pin, Shao-Huang
    Lee, Yehlin
    Wang, Hsin-Min
    Chen, Yong-Cheng
    Speech Commun, 3-4 (284-309):
  • [8] Combining linguistic with statistical methods in modeling prosody
    Price, P
    Ostendorf, M
    SIGNAL TO SYNTAX: BOOTSTRAPPING FROM SPEECH TO GRAMMAR IN EARLY ACQUISITION, 1996, : 67 - 83
  • [9] Modeling the effect of linguistic predictability on speech intelligibility prediction
    Edraki, Amin
    Chan, Wai-Yip
    Fogerty, Daniel
    Jensen, Jesper
    JASA EXPRESS LETTERS, 2023, 3 (03):
  • [10] Affective and linguistic processing of speech prosody: DC potential studies
    Pihan, Hans
    UNDERSTANDING EMOTIONS, 2006, 156 : 269 - 284