A RESEARCH BED FOR UNIT SELECTION BASED TEXT TO SPEECH SYNTHESIS

被引：0

作者：

Sarathy, K. Partha ^{[1
]}

Ramakrishnan, A. G. ^{[2
]}

机构：

[1] Ctr Dev Telemat, Bangalore 560100, Karnataka, India

[2] Indian Inst Sci, Dept Elect Engn, Bangalore 560100, Karnataka, India

来源：

2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS | 2008年

关键词：

speech synthesis; speech codecs; intelligibility; naturalness; perception;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The paper describes a modular, unit selection based TTS framework, which can be used as a research bed for developing TTS in any new language, as well as studying the effect of changing any parameter during synthesis. Using this framework, TTS has been developed for Tamil. Synthesis database consists of 1027 phonetically rich prerecorded sentences. This framework has already been tested for Kannada. Our TTS synthesizes intelligible and acceptably natural speech, as supported by high mean opinion scores. The framework is further optimized to suit embedded applications like mobiles and PDAs. We compressed the synthesis speech database with standard speech compression algorithms used in commercial GSM phones and evaluated the quality of the resultant synthesized sentences. Even with a highly compressed database, the synthesized output is perceptually close to that with uncompressed database. Through experiments, we explored the ambiguities in human perception when listening to Tamil phones and syllables uttered in isolation, thus proposing to exploit the misperception to substitute for missing phone contexts in the database. Listening experiments have been conducted on sentences synthesized by deliberately replacing phones with their confused ones.

引用

页码：229 / +

页数：2

共 50 条

[21] Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis
Mattheyses, Wesley
Latacz, Lukas
Verhelst, Werner
Sahil, Hichen
MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 125 - 136
[22] A Unit Selection Text-to-Speech Synthesis System Optimized for Use with Screen Readers
Chalamandaris, Aimilios
Karabetsos, Sotiris
Tsiakoulis, Pirros
Raptis, Spyros
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2010, 56 (03) : 1890 - 1897
[23] Unit Selection based Speech Synthesis for Poor Channel Condition
Cen, Ling
Dong, Minghui
Chan, Paul
Li, Haizhou
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2035 - 2038
[24] Triphone based unit selection for concatenative visual speech synthesis
Huang, FJ
Cosatto, E
Graf, HP
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2037 - 2040
[25] Unit selection speech synthesis in noise
Cernak, Milos
2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 761 - 764
[26] Speech unit selection based on target values driven by speech data in concatenative speech synthesis
Hirai, T
Tenpaku, S
Shikano, K
PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 43 - 46
[27] Minimum unit selection error training for HMM-based unit selection speech synthesis system
Ling, Zhen-Hua
Wang, Ren-Hua
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3949 - 3952
[28] Optimization method for unit selection speech synthesis based on synthesis quality predictions
Ling, Z. (zhling@ustc.edu.cn), 2013, Tsinghua University (53):
[29] Unit selection algorithm for Japanese speech synthesis based on both phoneme unit and diphone unit
Toda, T
Kawai, H
Tsuzaki, M
Shikano, K
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 465 - 468
[30] Scalable implementation of unit selection based text-to-speech system for embedded solutions
Nukaga, Nobuo
Kamoshida, Ryota
Nagamatsu, Kenji
Kitahara, Yoshinori
2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 849 - 852

← 1 2 3 4 5 →