Development of a Phrase-Based Speech-Recognition Test Using Synthetic Speech

被引：0

作者：

Ibelings, Saskia ^{[1
,2
,3
]}

Brand, Thomas ^{[2
,3
]}

Ruigendijk, Esther ^{[3
,4
]}

Holube, Inga ^{[1
,3
]}

机构：

[1] Jade Univ Appl Sci, Inst Hearing Technol & Audiol, Ofener Str 16-19, D-26121 Oldenburg, Germany

[2] Carl von Ossietzky Univ Oldenburg, Med Phys, Oldenburg, Germany

[3] Cluster Excellence Hearing4All, Oldenburg, Germany

[4] Carl von Ossietzky Univ Oldenburg, Dept Dutch, Oldenburg, Germany

来源：

TRENDS IN HEARING | 2024年 / 28卷

关键词：

text-to-speech; speech recognition; speech test; audiology; speech intelligibility; synthetic speech; phrase test; RECEPTION THRESHOLD; SENTENCE TEST; INTELLIGIBILITY; NOISE;

D O I：

10.1177/23312165241261490

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Speech-recognition tests are widely used in both clinical and research audiology. The purpose of this study was the development of a novel speech-recognition test that combines concepts of different speech-recognition tests to reduce training effects and allows for a large set of speech material. The new test consists of four different words per trial in a meaningful construct with a fixed structure, the so-called phrases. Various free databases were used to select the words and to determine their frequency. Highly frequent nouns were grouped into thematic categories and combined with related adjectives and infinitives. After discarding inappropriate and unnatural combinations, and eliminating duplications of (sub-)phrases, a total number of 772 phrases remained. Subsequently, the phrases were synthesized using a text-to-speech system. The synthesis significantly reduces the effort compared to recordings with a real speaker. After excluding outliers, measured speech-recognition scores for the phrases with 31 normal-hearing participants at fixed signal-to-noise ratios (SNR) revealed speech-recognition thresholds (SRT) for each phrase varying up to 4 dB. The median SRT was -9.1 dB SNR and thus comparable to existing sentence tests. The psychometric function's slope of 15 percentage points per dB is also comparable and enables efficient use in audiology. Summarizing, the principle of creating speech material in a modular system has many potential applications.

引用

页数：13

共 50 条

[41] AUTOMATED SPEECH-RECOGNITION ANATOMIC PATHOLOGY (ASAP) REPORTING
TEPLITZ, C
CIPRIANI, M
DICOSTANZO, D
SARLIN, J
SEMINARS IN DIAGNOSTIC PATHOLOGY, 1994, 11 (04) : 245 - 252
[42] Speech Recognition and Listening Effort of Meaningful Sentences Using Synthetic Speech
Ibelings, Saskia
Brand, Thomas
Holube, Inga
TRENDS IN HEARING, 2022, 26
[43] Speech-recognition market to exceed $5 billion by 2008
不详
MICROWAVES & RF, 2003, 42 (05) : 23 - 23
[44] SOVIET SPEECH-RECOGNITION SOFTWARE EXPANDS ITS REACH
VANTYLE, S
ELECTRONIC DESIGN, 1994, 42 (26) : 44 - 45
[45] Criteria for the Evaluation of Automated Speech-Recognition Scoring Algorithms
Dobrisek, Simon
ELEKTROTEHNISKI VESTNIK-ELECTROCHEMICAL REVIEW, 2008, 75 (04): : 229 - 234
[46] LIMITED-VOCABULARY ADAPTIVE SPEECH-RECOGNITION SYSTEM
ROSS, PW
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 1968, 16 (01): : 78 - &
[47] AS SPEECH-RECOGNITION PRODUCTS IMPROVE, OFFICE APPLICATIONS APPEAR
不详
ELECTRONIC DESIGN, 1982, 30 (12) : SS5 - SS6
[48] Developing an automated speech-recognition telephone diabetes intervention
Goldman, Roberta E.
Sanchez-Hernandez, Maya
Ross-Degnan, Dennis
Piette, John D.
Trinacty, Connie Mah
Simon, Steven R.
INTERNATIONAL JOURNAL FOR QUALITY IN HEALTH CARE, 2008, 20 (04) : 264 - 270
[49] Acoustic-to-Phrase Models for Speech Recognition
Gaur, Yashesh
Li, Jinyu
Meng, Zhong
Gong, Yifan
INTERSPEECH 2019, 2019, : 2240 - 2244
[50] Parts of Speech Tagged Phrase-Based Statistical Machine Translation System for English → Mizo Language
Devi C.S.
Roy A.K.
Purkayastha B.S.
SN Computer Science, 4 (6)

← 1 2 3 4 5 →