Predicting the Intelligibility of Vocoded Speech

被引:46
|
作者
Chen, Fei [1 ]
Loizou, Philipos C. [1 ]
机构
[1] Univ Texas Dallas, Dept Elect Engn, Richardson, TX 75080 USA
来源
EAR AND HEARING | 2011年 / 32卷 / 03期
关键词
NORMAL-HEARING LISTENERS; ELECTRIC HEARING; TEMPORAL CUES; PHONEME RECOGNITION; SIGNAL PROCESSORS; ACOUSTIC HEARING; NOISE; ENVELOPE; CHANNELS; INDEX;
D O I
10.1097/AUD.0b013e3181ff3515
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Objectives: The purpose of this study is to evaluate the performance of a number of speech intelligibility indices in terms of predicting the intelligibility of vocoded speech. Design: Noise-corrupted sentences were vocoded in a total of 80 conditions, involving three different signal-to-noise ratio levels (-5, 0, and 5 dB) and two types of maskers (steady state noise and two-talker). Tone-vocoder simulations and combined electric-acoustic stimulation (EAS) simulations were used. The vocoded sentences were presented to normal-hearing listeners for identification, and the resulting intelligibility scores were used to assess the correlation of various speech intelligibility measures. These included measures designed to assess speech intelligibility, including the speech transmission index (STI) and articulation index based measures, as well as distortions in hearing aids (e. g., coherence-based measures). These measures employed primarily either the temporal-envelope or the spectral-envelope information in the prediction model. The underlying hypothesis in the present study is that measures that assess temporal-envelope distortions, such as those based on the STI, should correlate highly with the intelligibility of vocoded speech. This is based on the fact that vocoder simulations preserve primarily envelope information, similar to the processing implemented in current cochlear implant speech processors. Similarly, it is hypothesized that measures such as the coherence-based index that assess the distortions present in the spectral envelope could also be used to model the intelligibility of vocoded speech. Results: Of all the intelligibility measures considered, the coherence-based and the STI-based measures performed the best. High correlations (r = 0.9 to 0.96) were maintained with the coherence-based measures in all noisy conditions. The highest correlation obtained with the STI-based measure was 0.92, and that was obtained when high modulation rates (100 Hz) were used. The performance of these measures remained high in both steady-noise and fluctuating masker conditions. The correlations with conditions involving tone-vocoded speech were found to be a bit higher than the correlations with conditions involving EAS-vocoded speech. Conclusions: The present study demonstrated that some of the speech intelligibility indices that have been found previously to correlate highly with wideband speech can also be used to predict the intelligibility of vocoded speech. Both the coherence-based and STI-based measures have been found to be good measures for modeling the intelligibility of vocoded speech. The highest correlation (r = 0.96) was obtained with a derived coherence measure that placed more emphasis on information contained in vowel/consonant spectral transitions and less emphasis on information contained in steady sonorant segments. High (100 Hz) modulation rates were found to be necessary in the implementation of the STI-based measures for better modeling of the intelligibility of vocoded speech. We believe that the difference in modulation rates needed for modeling the intelligibility of wideband versus vocoded speech can be attributed to the increased importance of higher modulation rates in situations where the amount of spectral information available to the listeners is limited (eight channels in our study). Unlike the traditional STI method that has been found to perform poorly in terms of predicting the intelligibility of processed speech wherein nonlinear operations are involved, the STI-based measure used in the present study has been found to perform quite well. In summary, the present study took the first step in modeling the intelligibility of vocoded speech. Access to such intelligibility measures is of high significance as they can be used to guide the development of new speech coding algorithms for cochlear implants.
引用
收藏
页码:331 / 338
页数:8
相关论文
共 50 条
  • [21] DESIGN GUIDE FOR PREDICTING SPEECH-INTELLIGIBILITY IN AUDITORIA
    LATHAM, H
    RIBA JOURNAL-ROYAL INSTITUTE OF BRITISH ARCHITECTS, 1978, 85 (02): : 62 - 63
  • [22] Segmental Contribution to Predicting Speech Intelligibility in Noisy Conditions
    Wang, Lei
    Chen, Fei
    Lai, Ying-Hui
    2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 476 - 480
  • [23] Predicting the Intelligibility of Noisy and Nonlinearly Processed Binaural Speech
    Andersen, Asger Heidemann
    de Haan, Jan Mark
    Tan, Zheng-Hua
    Jensen, Jesper
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 1908 - 1920
  • [24] Effects of the salience of pitch and periodicity information on the intelligibility of four-channel vocoded speech: Implications for cochlear implants
    Faulkner, A
    Rosen, S
    Smith, C
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 108 (04): : 1877 - 1887
  • [25] AN APPROACH TO THE EVALUATION OF VOCODED SPEECH
    MEEKER, WF
    SMITH, CP
    SMITH, WR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1960, 32 (11): : 1502 - 1502
  • [26] A metric for predicting binaural speech intelligibility in stationary noise and competing speech maskers
    Tang, Yan
    Cooke, Martin
    Fazenda, Bruno M.
    Cox, Trevor J.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 140 (03): : 1858 - 1870
  • [27] Using the Speech Transmission Index for predicting non-native speech intelligibility
    van Wijngaarden, SJ
    Bronkhorst, AW
    Houtgast, T
    Steeneken, HJM
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2004, 115 (03): : 1281 - 1291
  • [28] Predicting Speech Intelligibility With a Multiple Speech Subsystems Approach in Children With Cerebral Palsy
    Lee, Jimin
    Hustad, Katherine C.
    Weismer, Gary
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2014, 57 (05): : 1666 - 1678
  • [29] Objective intelligibility measurement of reverberant vocoded speech for normal-hearing listeners: Towards facilitating the development of speech enhancement algorithms for cochlear implants
    Shahidi, Lidea K.
    Collins, Leslie M.
    Mainsah, Boyla O.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2024, 155 (03): : 2151 - 2168
  • [30] Predicting Intelligibility Deficit in Dysphonic Speech with Cepstral Peak Prominence
    Ishikawa, Keiko
    de Alarcon, Alessandro
    Khosla, Sid
    Kelchner, Lisa
    Silbert, Noah
    Boyce, Suzanne
    ANNALS OF OTOLOGY RHINOLOGY AND LARYNGOLOGY, 2018, 127 (02): : 69 - 78