Developments in corpus-based speech synthesis: Approaching natural conversational speech

被引：19

作者：

Campbell, N ^{[1
]}

机构：

[1] ATR Network Informat Labs, Dept Emergent Commun, Kyoto 6190288, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2005年 / E88D卷 / 03期

关键词：

speech synthesis; corpora; concatenation; paralinguistic information; communication; affect;

D O I：

10.1093/ietisy/e88-d.3.376

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from real-world conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.

引用

页码：376 / 383

页数：8

共 50 条

[21] Fundamental frequency modeling for corpus-based speech synthesis based on a statistical learning technique
Sakai, S
Glass, J
ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 712 - 717
[22] Recent progress in corpus-based spontaneous speech recognition
Furui, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03): : 366 - 375
[23] A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese
Chou, FC
Tseng, CY
Lee, LS
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 481 - 494
[24] A corpus-based study of reviewers' usage of speech acts
Nasser, Marwa Adel
COGENT ARTS & HUMANITIES, 2022, 9 (01):
[25] Corpus-based approaches to the phonological analysis of speech Introduction
Kubozono, Haruo
Maekawa, Kikuo
Vance, Timothy J.
LABORATORY PHONOLOGY, 2015, 6 (3-4): : 279 - 280
[26] A CORPUS-BASED STUDY OF REPAIR CUES IN SPONTANEOUS SPEECH
NAKATANI, CH
HIRSCHBERG, J
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (03): : 1603 - 1616
[27] Modal Particles in indirect Speech A corpus-based Study
Thurmair, Maria
SPRACHWISSENSCHAFT, 2019, 44 (01): : 1 - 72
[28] A Corpus-Based Approach to the Study of Speech Act of Thanking
Cheng, Stephanie W.
CONCENTRIC-STUDIES IN LINGUISTICS, 2010, 36 (02) : 257 - 274
[29] Corpus-based Mandarin speech synthesis with contextual syllabic units based on phonetic properties
Chou, FC
Tseng, CY
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 893 - 896
[30] Decision Tree-based Training of Probabilistic Concatenation Models for Corpus-based Speech Synthesis
Sakai, Shinsuke
Kawahara, Tatsuya
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1746 - 1749

← 1 2 3 4 5 →