Developments in corpus-based speech synthesis: Approaching natural conversational speech

被引:19
|
作者
Campbell, N [1 ]
机构
[1] ATR Network Informat Labs, Dept Emergent Commun, Kyoto 6190288, Japan
来源
关键词
speech synthesis; corpora; concatenation; paralinguistic information; communication; affect;
D O I
10.1093/ietisy/e88-d.3.376
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from real-world conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.
引用
收藏
页码:376 / 383
页数:8
相关论文
共 50 条
  • [31] A Corpus-Based Approach to Speech Enhancement from Nonstationary Noise
    Ming, Ji
    Srinivasan, Ramji
    Crookes, Danny
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1097 - 1100
  • [32] Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing
    Nickel, Robert M.
    Astudillo, Ramon Fernandez
    Kolossa, Dorothea
    Martin, Rainer
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (05): : 983 - 997
  • [33] A Corpus-based Analysis of Mixed Code in Hong Kong Speech
    Lee, John
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 165 - 168
  • [34] Speech Database Reduction Method for Corpus-Based TTS System
    Isogai, Mitsuaki
    Mizuno, Hideyuki
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 158 - 161
  • [35] A Corpus-Based Approach to Speech Enhancement From Nonstationary Noise
    Ming, Ji
    Srinivasan, Ramji
    Crookes, Danny
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 822 - 836
  • [36] Mandarin Chinese words and parts of speech: A corpus-based study
    Ren, Yi
    CHINESE LANGUAGE AND DISCOURSE, 2020, 11 (02) : 371 - 375
  • [37] A new Korean corpus-based text-to-speech system
    Kim S.
    Lee Y.
    Hirose K.
    International Journal of Speech Technology, 2002, 5 (02) : 105 - 116
  • [38] COLLECTION AND ANNOTATION OF MALAY CONVERSATIONAL SPEECH CORPUS
    Chong, Tze Yuang
    Xiao, Xiong
    Tan, Tien-Ping
    Chng, Eng Siong
    Li, Haizhou
    2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2012, : 30 - 35
  • [39] Filled pauses in speech synthesis: Towards conversational speech
    Adell, Jordi
    Bonafonte, Antonio
    Escudero, David
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 358 - +
  • [40] Unit selection algorithm using Bi-grams model for corpus-based speech synthesis
    Kammoun, Mohamed Ali
    Hamida, Ahmed Ben
    World Academy of Science, Engineering and Technology, 2009, 35 : 722 - 727