Development of a Phrase-Based Speech-Recognition Test Using Synthetic Speech

被引:0
|
作者
Ibelings, Saskia [1 ,2 ,3 ]
Brand, Thomas [2 ,3 ]
Ruigendijk, Esther [3 ,4 ]
Holube, Inga [1 ,3 ]
机构
[1] Jade Univ Appl Sci, Inst Hearing Technol & Audiol, Ofener Str 16-19, D-26121 Oldenburg, Germany
[2] Carl von Ossietzky Univ Oldenburg, Med Phys, Oldenburg, Germany
[3] Cluster Excellence Hearing4All, Oldenburg, Germany
[4] Carl von Ossietzky Univ Oldenburg, Dept Dutch, Oldenburg, Germany
来源
TRENDS IN HEARING | 2024年 / 28卷
关键词
text-to-speech; speech recognition; speech test; audiology; speech intelligibility; synthetic speech; phrase test; RECEPTION THRESHOLD; SENTENCE TEST; INTELLIGIBILITY; NOISE;
D O I
10.1177/23312165241261490
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech-recognition tests are widely used in both clinical and research audiology. The purpose of this study was the development of a novel speech-recognition test that combines concepts of different speech-recognition tests to reduce training effects and allows for a large set of speech material. The new test consists of four different words per trial in a meaningful construct with a fixed structure, the so-called phrases. Various free databases were used to select the words and to determine their frequency. Highly frequent nouns were grouped into thematic categories and combined with related adjectives and infinitives. After discarding inappropriate and unnatural combinations, and eliminating duplications of (sub-)phrases, a total number of 772 phrases remained. Subsequently, the phrases were synthesized using a text-to-speech system. The synthesis significantly reduces the effort compared to recordings with a real speaker. After excluding outliers, measured speech-recognition scores for the phrases with 31 normal-hearing participants at fixed signal-to-noise ratios (SNR) revealed speech-recognition thresholds (SRT) for each phrase varying up to 4 dB. The median SRT was -9.1 dB SNR and thus comparable to existing sentence tests. The psychometric function's slope of 15 percentage points per dB is also comparable and enables efficient use in audiology. Summarizing, the principle of creating speech material in a modular system has many potential applications.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Using Prosodic Phrase-Based VQVAE on Audio ALBERT for Speech Emotion Recognition
    Hsu, Jia-Hao
    Wu, Chung-Hsien
    Yang, Tsung-Hsien
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 415 - 419
  • [2] Statistical phrase-based speech translation
    Mathias, Lambert
    Byrne, William
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 561 - 564
  • [3] NESTING HIERARCHICAL PHRASE-BASED MODEL FOR SPEECH-TO-SPEECH TRANSLATION
    Fu, Xiaoyin
    Wei, Wei
    Fan, Lichun
    Lu, Shixiang
    Xu, Bo
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 368 - 372
  • [4] Phrase-based part-of-speech tagging
    Finch, Andrew
    Sumita, Eiichiro
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 215 - +
  • [5] A Hybrid Phrase-based/Statistical Speech Translation System
    Stallard, David
    Choi, Fred
    Krstovski, Kriste
    Natarajan, Prem
    Prasad, Rohit
    Saleem, Shirin
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 757 - 760
  • [6] Measuring Speech Recognition With a Matrix Test Using Synthetic Speech
    Nuesse, Theresa
    Wiercinski, Bianca
    Brand, Thomas
    Holube, Inga
    TRENDS IN HEARING, 2019, 23
  • [7] SPEECH-RECOGNITION PRODUCTS
    GALLANT, JA
    EDN, 1989, 34 (02) : 112 - &
  • [8] IMPLICATIONS OF SPEECH-RECOGNITION STUDIES
    FLEMING, L
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1970, 47 (06): : 1612 - &
  • [9] Speech-Recognition Technology for Computers
    Tom Kramer
    Robert Kennedy
    Academic Psychiatry, 1999, 23 : 48 - 50
  • [10] Phrase-based translation of speech recognizer word lattices using loglinear model combination
    Matusov, E
    Ney, H
    Schlüter, R
    2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 110 - 115