Predicting the Quality of Text-To-Speech Systems from a Large-Scale Feature Set

被引：0

作者：

Hinterleitner, Florian ^{[1
]}

Norrenbrock, Christoph R. ^{[2
]}

Moeller, Sebastian ^{[1
]}

Heute, Ulrich ^{[2
]}

机构：

[1] TU Berlin, Qual & Usabil Lab, Berlin, Germany

[2] CAU Kiel, Digital Signal Proc & Syst Theory, Kiel, Germany

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

quality prediction; text-to-speech (TTS); cross-validation;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We extract 1495 speech features from 2 subjectively evaluated text-to-speech (TTS) databases. These features are extracted from pitch, loudness, MFCCs, spectrals, formants, and intensity. The speech material is synthesized using up to 15 different TTS systems, some of them with up to 8 different voices. We develop quality predictors for TTS signals following two different approaches to handle the huge set of speech features: a three-step feature selection followed by a stepwise multiple linear regression and an approach based on support vector machines. The predictors are cross-validated via 3-fold cross validation (CV) and leave-one-test-out (LOTO) CV. Due to the high number of features we apply a strict CV method where the partitioning is realized prior to the feature scaling and feature selection steps. In comparison we also follow a semi-strict approach where the partitioning effectively takes place after these steps. In the 3-fold CV case we achieve correlations as high as .75 for strict CV and .89 for semi-strict CV. The more ambitious LOTO CV yields correlations around .80 for the male speakers whereas the results for the female voices show the need for improvement.

引用

页码：383 / 387

页数：5

共 50 条

[31] Predicting the Large-Scale Evolution of Tag Systems
Martin, Carlos
COMPLEX SYSTEMS, 2016, 25 (02): : 79 - 107
[32] Aperiodicity Analysis for Quality Estimation of Text-to-Speech Signals
Norrenbrock, Christoph
Heute, Ulrich
Hinterleitner, Florian
Moeller, Sebastian
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2204 - 2207
[33] Instrumental Assessment of Prosodic Quality for Text-to-Speech Signals
Norrenbrock, Christoph R.
Hinterleitner, Florian
Heute, Ulrich
Moeller, Sebastian
IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (05) : 255 - 258
[34] High-quality text-to-speech synthesis: An overview
Dutoit, T.
Journal of Electrical and Electronics Engineering, Australia, 1997, 17 (01): : 25 - 36
[35] Objective evaluation methods for Chinese Text-To-Speech systems
Zhang, Teng
Chen, Zhipeng
Wu, Ji
Lail, Sam
Lei, Wenhui
Isert, Carsten
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 332 - 336
[36] Experiments with training corpora for statistical text-to-speech systems
Podsiadlo, Monika
Ungureanu, Victor
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2002 - 2006
[37] Building Text-to-Speech Systems for Resource Poor Languages
Samsudin, Nur-Hana
Lee, Mark
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3327 - 3334
[38] Development of Prototype Text-to-Speech Systems for Northern Sotho
Oosthuizen, H. J.
Phihlela, S. T.
Manamela, M. J. D.
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1348 - 1351
[39] Constructing text-to-speech systems for languages with unknown pronunciations
Sawada, Kei
Hashimoto, Kei
Oura, Keiichiro
Nankaku, Yoshihiko
Tokuda, Keiichi
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2018, 39 (02) : 119 - 129
[40] Romanian language statistics and resources for text-to-speech systems
Stan, Adriana
Giurgiu, Mircea
2010 9TH INTERNATIONAL SYMPOSIUM ON ELECTRONICS AND TELECOMMUNICATIONS (ISETC), 2010, : 381 - 384

← 1 2 3 4 5 →