Developments in corpus-based speech synthesis: Approaching natural conversational speech

被引：19

作者：

Campbell, N ^{[1
]}

机构：

[1] ATR Network Informat Labs, Dept Emergent Commun, Kyoto 6190288, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2005年 / E88D卷 / 03期

关键词：

speech synthesis; corpora; concatenation; paralinguistic information; communication; affect;

D O I：

10.1093/ietisy/e88-d.3.376

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from real-world conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.

引用

页码：376 / 383

页数：8

共 50 条

[31] A Corpus-Based Approach to Speech Enhancement from Nonstationary Noise
Ming, Ji
Srinivasan, Ramji
Crookes, Danny
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1097 - 1100
[32] Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing
Nickel, Robert M.
Astudillo, Ramon Fernandez
Kolossa, Dorothea
Martin, Rainer
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (05): : 983 - 997
[33] A Corpus-based Analysis of Mixed Code in Hong Kong Speech
Lee, John
2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 165 - 168
[34] Speech Database Reduction Method for Corpus-Based TTS System
Isogai, Mitsuaki
Mizuno, Hideyuki
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 158 - 161
[35] A Corpus-Based Approach to Speech Enhancement From Nonstationary Noise
Ming, Ji
Srinivasan, Ramji
Crookes, Danny
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 822 - 836
[36] Mandarin Chinese words and parts of speech: A corpus-based study
Ren, Yi
CHINESE LANGUAGE AND DISCOURSE, 2020, 11 (02) : 371 - 375
[37] A new Korean corpus-based text-to-speech system
Kim S.
Lee Y.
Hirose K.
International Journal of Speech Technology, 2002, 5 (02) : 105 - 116
[38] COLLECTION AND ANNOTATION OF MALAY CONVERSATIONAL SPEECH CORPUS
Chong, Tze Yuang
Xiao, Xiong
Tan, Tien-Ping
Chng, Eng Siong
Li, Haizhou
2012 INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2012, : 30 - 35
[39] Filled pauses in speech synthesis: Towards conversational speech
Adell, Jordi
Bonafonte, Antonio
Escudero, David
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 358 - +
[40] Unit selection algorithm using Bi-grams model for corpus-based speech synthesis
Kammoun, Mohamed Ali
Hamida, Ahmed Ben
World Academy of Science, Engineering and Technology, 2009, 35 : 722 - 727

← 1 2 3 4 5 →