Predicting the Quality of Text-To-Speech Systems from a Large-Scale Feature Set

被引:0
|
作者
Hinterleitner, Florian [1 ]
Norrenbrock, Christoph R. [2 ]
Moeller, Sebastian [1 ]
Heute, Ulrich [2 ]
机构
[1] TU Berlin, Qual & Usabil Lab, Berlin, Germany
[2] CAU Kiel, Digital Signal Proc & Syst Theory, Kiel, Germany
关键词
quality prediction; text-to-speech (TTS); cross-validation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We extract 1495 speech features from 2 subjectively evaluated text-to-speech (TTS) databases. These features are extracted from pitch, loudness, MFCCs, spectrals, formants, and intensity. The speech material is synthesized using up to 15 different TTS systems, some of them with up to 8 different voices. We develop quality predictors for TTS signals following two different approaches to handle the huge set of speech features: a three-step feature selection followed by a stepwise multiple linear regression and an approach based on support vector machines. The predictors are cross-validated via 3-fold cross validation (CV) and leave-one-test-out (LOTO) CV. Due to the high number of features we apply a strict CV method where the partitioning is realized prior to the feature scaling and feature selection steps. In comparison we also follow a semi-strict approach where the partitioning effectively takes place after these steps. In the 3-fold CV case we achieve correlations as high as .75 for strict CV and .89 for semi-strict CV. The more ambitious LOTO CV yields correlations around .80 for the male speakers whereas the results for the female voices show the need for improvement.
引用
收藏
页码:383 / 387
页数:5
相关论文
共 50 条
  • [31] Predicting the Large-Scale Evolution of Tag Systems
    Martin, Carlos
    COMPLEX SYSTEMS, 2016, 25 (02): : 79 - 107
  • [32] Aperiodicity Analysis for Quality Estimation of Text-to-Speech Signals
    Norrenbrock, Christoph
    Heute, Ulrich
    Hinterleitner, Florian
    Moeller, Sebastian
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2204 - 2207
  • [33] Instrumental Assessment of Prosodic Quality for Text-to-Speech Signals
    Norrenbrock, Christoph R.
    Hinterleitner, Florian
    Heute, Ulrich
    Moeller, Sebastian
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (05) : 255 - 258
  • [34] High-quality text-to-speech synthesis: An overview
    Dutoit, T.
    Journal of Electrical and Electronics Engineering, Australia, 1997, 17 (01): : 25 - 36
  • [35] Objective evaluation methods for Chinese Text-To-Speech systems
    Zhang, Teng
    Chen, Zhipeng
    Wu, Ji
    Lail, Sam
    Lei, Wenhui
    Isert, Carsten
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 332 - 336
  • [36] Experiments with training corpora for statistical text-to-speech systems
    Podsiadlo, Monika
    Ungureanu, Victor
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2002 - 2006
  • [37] Building Text-to-Speech Systems for Resource Poor Languages
    Samsudin, Nur-Hana
    Lee, Mark
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3327 - 3334
  • [38] Development of Prototype Text-to-Speech Systems for Northern Sotho
    Oosthuizen, H. J.
    Phihlela, S. T.
    Manamela, M. J. D.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1348 - 1351
  • [39] Constructing text-to-speech systems for languages with unknown pronunciations
    Sawada, Kei
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2018, 39 (02) : 119 - 129
  • [40] Romanian language statistics and resources for text-to-speech systems
    Stan, Adriana
    Giurgiu, Mircea
    2010 9TH INTERNATIONAL SYMPOSIUM ON ELECTRONICS AND TELECOMMUNICATIONS (ISETC), 2010, : 381 - 384