Intelligibility of time-compressed synthetic speech: Compression method and speaking style

被引:4
|
作者
Valentini-Botinhao, Cassia [1 ]
Toman, Markus [2 ]
Pucher, Michael [2 ]
Schabus, Dietmar [2 ]
Yamagishi, Junichi [1 ,3 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
[2] Telecommun Res Ctr Vienna FTW, Vienna, Austria
[3] Natl Inst Informat, Tokyo, Japan
关键词
Fast speech; Time-compression; HMM-based speech synthesis; Blind individuals; ALGORITHMS;
D O I
10.1016/j.specom.2015.09.002
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present a series of intelligibility experiments performed on natural and synthetic speech time-compressed at a range of rates and analyze the effect of speech corpus and compression method on the intelligibility scores of sighted and blind individuals. Particularly we are interested in comparing linear and non-linear compression methods applied to normal and fast speech of different speakers. We recorded English and German language voice talents reading prompts at a normal and a fast rate. To create synthetic voices we trained a statistical parametric speech synthesis system based on the normal and the fast data of each speaker. We compared three compression methods: scaling the variance of the state duration model, interpolating the duration models of the fast and the normal voices, and applying a linear compression method to the generated speech waveform. Word recognition results for the English voices show that generating speech at a normal speaking rate and then applying linear compression resulted in the most intelligible speech at all tested rates. A similar result was found when evaluating the intelligibility of the natural speech corpus. For the German voices, interpolation was found to be better at moderate speaking rates but the linear method was again more successful at very high rates, particularly when applied to the fast data. Phonemic level annotation of the normal and fast databases showed that the German speaker was able to reproduce speech at a fast rate with fewer deletion and substitution errors compared to the English speaker, supporting the intelligibility benefits observed when compressing his fast speech. This shows that the use of fast speech data to create faster synthetic voices does not necessarily lead to more intelligible voices as results are highly dependent on how successful the speaker was at speaking fast while maintaining intelligibility. Linear compression applied to normal rate speech can more reliably provide higher intelligibility, particularly at ultra fast rates. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:52 / 64
页数:13
相关论文
共 50 条
  • [1] INTELLIGIBILITY OF TIME-COMPRESSED SPEECH
    KLUMPP, RG
    WEBSTER, JC
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1961, 33 (03): : 265 - &
  • [2] WORD INTELLIGIBILITY DECREMENTS AND THE COMPREHENSION OF TIME-COMPRESSED SPEECH
    HEIMAN, GW
    LEO, RJ
    LEIGHBODY, G
    BOWLER, K
    PERCEPTION & PSYCHOPHYSICS, 1986, 40 (06): : 407 - 411
  • [3] Time-compressed speech intelligibility in different reverberant conditions
    Kocinski, Jedrzej
    Niemiec, Dawid
    APPLIED ACOUSTICS, 2016, 113 : 58 - 63
  • [4] Intelligibility of time-compressed speech: The effect of uniform versus non-uniform time-compression algorithms
    20141317502905
    Schlueter, A. (anne.schlueter@jade-hs.de), 1600, Acoustical Society of America (135):
  • [5] SPEECH-RATE INTELLIGIBILITY THRESHOLD FOR SPEEDED AND TIME-COMPRESSED CONNECTED SPEECH
    HAAN, HJD
    PERCEPTION & PSYCHOPHYSICS, 1977, 22 (04): : 366 - 372
  • [6] Intelligibility of time-compressed speech: The effect of uniform versus non-uniform time-compression algorithms
    Schlueter, Anne
    Lemke, Ulrike
    Kollmeier, Birger
    Holube, Inga
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 135 (03): : 1541 - 1555
  • [7] TIME-COMPRESSED SPEECH
    LEATHERDALE, P
    JOURNAL OF AUDIOVISUAL MEDIA IN MEDICINE, 1981, 4 (03): : 103 - 104
  • [8] The Intelligibility of Time-Compressed Speech Is Correlated with the Ability to Listen in Modulated Noise
    Gransier, Robin
    van Wieringen, Astrid
    Wouters, Jan
    JARO-JOURNAL OF THE ASSOCIATION FOR RESEARCH IN OTOLARYNGOLOGY, 2022, 23 (03): : 413 - 426
  • [9] The Intelligibility of Time-Compressed Speech Is Correlated with the Ability to Listen in Modulated Noise
    Robin Gransier
    Astrid van Wieringen
    Jan Wouters
    Journal of the Association for Research in Otolaryngology, 2022, 23 : 413 - 426
  • [10] INTELLIGIBILITY OF TIME-COMPRESSED SENTENTIAL STIMULI
    BEASLEY, DS
    BRATT, GW
    RINTELMANN, WF
    JOURNAL OF SPEECH AND HEARING RESEARCH, 1980, 23 (04): : 722 - 729