Predicting the Quality of Text-To-Speech Systems from a Large-Scale Feature Set

被引:0
|
作者
Hinterleitner, Florian [1 ]
Norrenbrock, Christoph R. [2 ]
Moeller, Sebastian [1 ]
Heute, Ulrich [2 ]
机构
[1] TU Berlin, Qual & Usabil Lab, Berlin, Germany
[2] CAU Kiel, Digital Signal Proc & Syst Theory, Kiel, Germany
关键词
quality prediction; text-to-speech (TTS); cross-validation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We extract 1495 speech features from 2 subjectively evaluated text-to-speech (TTS) databases. These features are extracted from pitch, loudness, MFCCs, spectrals, formants, and intensity. The speech material is synthesized using up to 15 different TTS systems, some of them with up to 8 different voices. We develop quality predictors for TTS signals following two different approaches to handle the huge set of speech features: a three-step feature selection followed by a stepwise multiple linear regression and an approach based on support vector machines. The predictors are cross-validated via 3-fold cross validation (CV) and leave-one-test-out (LOTO) CV. Due to the high number of features we apply a strict CV method where the partitioning is realized prior to the feature scaling and feature selection steps. In comparison we also follow a semi-strict approach where the partitioning effectively takes place after these steps. In the 3-fold CV case we achieve correlations as high as .75 for strict CV and .89 for semi-strict CV. The more ambitious LOTO CV yields correlations around .80 for the male speakers whereas the results for the female voices show the need for improvement.
引用
收藏
页码:383 / 387
页数:5
相关论文
共 50 条
  • [1] Comparison of Approaches for Instrumentally Predicting the Quality of Text-To-Speech Systems
    Moeller, Sebastian
    Hinterleitner, Florian
    Falk, Tiago H.
    Polzehl, Tim
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1325 - +
  • [2] Choice of Voices: A Large-Scale Evaluation of Text-to-Speech Voice Quality for Long-Form Content
    Cambre, Julia
    Colnago, Jessica
    Maddock, Jim
    Tsai, Janice
    Kaye, Jofish
    PROCEEDINGS OF THE 2020 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'20), 2020,
  • [3] Enhancing the Quality of Nepali Text-to-Speech Systems
    Ghimire, Rupak Raj
    Bal, Bal Krishna
    CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS), 2017, 754 : 187 - 197
  • [4] Perceptual Quality Dimensions of Text-to-Speech Systems
    Hinterleitner, Florian
    Moeller, Sebastian
    Norrenbrock, Christoph
    Heute, Ulrich
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2188 - 2191
  • [5] Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
    Kim, Minchan
    Jeong, Myeonghun
    Choi, Byoung Jin
    Ahn, Sunghwan
    Lee, Joun Yeop
    Kim, Nam Soo
    INTERSPEECH 2022, 2022, : 788 - 792
  • [6] Text processing techniques for text-to-speech conversion systems to enhance the quality of synthesized speech
    ATR Interpreting Telecommunications, Research Lab
    NTT R&D, 10 (1011-1018):
  • [7] Comparison of measures of speech quality for listening tests of text-to-speech systems
    Viswanathan, M
    Viswanathan, M
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 11 - 14
  • [8] Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale
    Viswanathan, M
    Viswanathan, M
    COMPUTER SPEECH AND LANGUAGE, 2005, 19 (01): : 55 - 83
  • [9] Physiological Quality-of-Experience Assessment of Text-to-Speech Systems
    Gupta, Rishabh
    Falk, Tiago H.
    2016 IEEE 18TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2016,
  • [10] A text analyzer for Korean text-to-speech systems
    Lee, SH
    Oh, YH
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1692 - 1695