Fluent speech prosody: Framework and modeling

被引:78
|
作者
Tseng, CY [1 ]
Pin, SH
Lee, Y
Wang, HM
Chen, YC
机构
[1] Acad Sinica, Phonet Lab, Inst Linguist, Taipei, Taiwan
[2] Acad Sinica, Inst Sci Informat, Taipei, Taiwan
关键词
prosodic phrase grouping; top-down; PG; prosodic hierarchy; multi-phrase; cross-phrase; constraints; templates; speech planning; look-ahead; global F-0 templates; temporal allocations; syllable duration patterns; intensity distribution; boundary breaks;
D O I
10.1016/j.specom.2005.03.015
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The prosody of fluent connected speech is much more complicated than concatenating individual sentence intonations into strings. We analyzed speech corpora of read Mandarin Chinese discourses from a top-down perspective on perceived units and boundaries, and consistently identified speech paragraphs of multiple phrases that reflected discourse rather than sentence effects in fluent speech. Subsequent cross-speaker and cross-speaking-rate acoustic analyses of identified speech paragraphs revealed systematic cross-phrase prosodic patterns in every acoustic parameter, namely, F-0 contours, duration adjustment, intensity patterns, and in addition, boundary breaks. We therefore argue for a higher prosodic node that governs, constrains, and groups phrases to derive speech paragraphs. A hierarchical multi-phrase framework is constructed to account for the governing effect, with complimentary production and perceptual evidences. We show how cross-phrase F-0 and syllable duration patterns templates are derived to account for the tune and rhythm characteristic to fluent speech prosody, and argue for a prosody framework that specifies phrasal intonations as subjacent sister constituent subject to higher terms. Output fluent speech prosody is thus cumulative results of contributions from every prosodic layer. To test our framework, we further construct a modular prosody model of multiplephrase grouping with four corresponding acoustic modules and begin testing the model with speech synthesis. To conclude, we argue that any prosody framework of fluent speech should include prosodic contributions above individual sentences in production, with considerations of its perceptual effects to on-line processing; and development of unlimited TTS could benefit most appreciably by capturing and including cross-phrase relationships in prosody modeling. (c) 2005 Published by Elsevier B.V.
引用
收藏
页码:284 / 309
页数:26
相关论文
共 50 条
  • [1] Fluent speech prosody: Framework and modeling
    Tseng, Chiu-Yu
    Pin, Shao-Huang
    Lee, Yehlin
    Wang, Hsin-Min
    Chen, Yong-Cheng
    Speech Commun, 3-4 (284-309):
  • [2] An interaction between prosody and statistics in the segmentation of fluent speech
    Shukla, Mohinish
    Nespor, Marina
    Mehler, Jacques
    COGNITIVE PSYCHOLOGY, 2007, 54 (01) : 1 - 32
  • [3] PROSODY MODELING FOR MANDARIN EXCLAMATORY SPEECH
    Jia, Huibin
    Tao, Jianhua
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 890 - 893
  • [4] Prosody and fluent reading
    Picanco, Gessiane Lobato
    Vansiler, Nair Sauaia
    GRAGOATA-UFF, 2014, 19 (36): : 157 - 174
  • [5] Prosody modeling for automatic speech recognition and understanding
    Shriberg, E
    Stolcke, A
    MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 105 - 114
  • [6] Prosody analysis and modeling for emotional speech synthesis
    Jiang, DN
    Zhang, W
    Shen, LQ
    Cai, LH
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 281 - 284
  • [7] Hierarchical prosody modeling for Mandarin spontaneous speech
    Lin, Cheng-Hsien
    You, Chung-Long
    Chiang, Chen-Yu
    Wang, Yih-Ru
    Chen, Sin-Horng
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (04): : 2576 - 2596
  • [8] Modeling arabic prosody for a text-to-speech system
    Boukadida, F.
    Ellouze, N.
    International Review on Computers and Software, 2009, 4 (03) : 337 - 343
  • [9] Modeling prosody for language identification on read and spontaneous speech
    Rouas, JL
    Farinas, J
    Pellegrino, F
    André-Obrecht, R
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 753 - 756
  • [10] Modeling prosody for language identification on read and spontaneous speech
    Rouas, JL
    Farinas, J
    Pellegrino, F
    André-Obrecht, R
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 40 - 43