Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech

被引:0
|
作者
Zhao, Robin [1 ]
Choi, Anna S. G. [2 ]
Koenecke, Allison [2 ]
Rameau, Anais [1 ]
机构
[1] Weill Cornell Med Coll, Sean Parker Inst Voice, New York, NY USA
[2] Cornell Univ, Dept Informat Sci, Ithaca, NY USA
来源
LARYNGOSCOPE | 2025年 / 135卷 / 01期
关键词
artificial intelligence; voice; DEAF SPEECH; INTELLIGIBILITY; CHILDREN; PERCEPTION; SKILLS;
D O I
10.1002/lary.31713
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
ObjectiveTo evaluate the performance of commercial automatic speech recognition (ASR) systems on d/Deaf and hard-of-hearing (d/Dhh) speech.MethodsA corpus containing 850 audio files of d/Dhh and normal hearing (NH) speech from the University of Memphis Speech Perception Assessment Laboratory was tested on four speech-to-text application program interfaces (APIs): Amazon Web Services, Microsoft Azure, Google Chirp, and OpenAI Whisper. We quantified the Word Error Rate (WER) of API transcriptions for 24 d/Dhh and nine NH participants and performed subgroup analysis by speech intelligibility classification (SIC), hearing loss (HL) onset, and primary communication mode.ResultsMean WER averaged across APIs was 10 times higher for the d/Dhh group (52.6%) than the NH group (5.0%). APIs performed significantly worse for "low" and "medium" SIC (85.9% and 46.6% WER, respectively) as compared to "high" SIC group (9.5% WER, comparable to NH group). APIs performed significantly worse for speakers with prelingual HL relative to postlingual HL (80.5% and 37.1% WER, respectively). APIs performed significantly worse for speakers primarily communicating with sign language (70.2% WER) relative to speakers with both oral and sign language communication (51.5%) or oral communication only (19.7%).ConclusionCommercial ASR systems underperform for d/Dhh individuals, especially those with "low" and "medium" SIC, prelingual onset of HL, and sign language as primary communication mode. This contrasts with Big Tech companies' promises of accessibility, indicating the need for ASR systems ethically trained on heterogeneous d/Dhh speech data.Level of Evidence3 Laryngoscope, 2024 Commercial automatic speech recognition (ASR) systems underperform for d/Deaf and hard-of-hearing (d/Dhh) individuals, especially those with "low" and "medium" speech intelligibility classification, prelingual onset of hearing loss, and sign language as primary communication mode. There is a need for ASR systems ethically trained on heterogeneous d/Dhh speech data.image
引用
收藏
页码:191 / 197
页数:7
相关论文
共 50 条
  • [1] Deaf, Hard of Hearing, and Hearing Perspectives on using Automatic Speech Recognition in Conversation
    Glasser, Abraham
    Kushalnagar, Kesavan
    Kushalnagar, Raja
    PROCEEDINGS OF THE 19TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY (ASSETS'17), 2017, : 427 - 432
  • [2] Automatic Speech Recognition Services: Deaf and Hard-of-Hearing Usability
    Glasser, Abraham
    CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
  • [3] The speech of the hard of hearing and the deaf
    Gutzmann, H
    DEUTSCHE MEDIZINISCHE WOCHENSCHRIFT, 1902, 28 : 340 - 341
  • [4] The speech of the hard of hearing and the deaf
    Gutzmann, H
    DEUTSCHE MEDIZINISCHE WOCHENSCHRIFT, 1902, 28 : 323 - 325
  • [5] Feasibility of Using Automatic Speech Recognition with Voices of Deaf and Hard-of-Hearing Individuals
    Glasser, Abraham T.
    Kushalnagar, Kesavan R.
    Kushalnagar, Raja S.
    PROCEEDINGS OF THE 19TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY (ASSETS'17), 2017, : 373 - 374
  • [6] Captioning for deaf and hard of hearing people by editing automatic speech recognition in real time
    Wald, Mike
    COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, PROCEEDINGS, 2006, 4061 : 683 - 690
  • [7] Exploration of Automatic Speech Recognition for Deaf and Hard of Hearing Students in Higher Education Classes
    Butler, Janine
    Trager, Brian
    Behm, Byron
    ASSETS'19: THE 21ST INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2019, : 32 - 42
  • [8] An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing
    Violeta, Lester Phillip
    Toda, Tomoki
    2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023, 2023, : 1862 - 1867
  • [9] An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing
    Violeta, Lester Phillip
    Toda, Tomoki
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1862 - 1867
  • [10] Preferred Appearance of Captions Generated by Automatic Speech Recognition for Deaf and Hard-of-Hearing Viewers
    Berke, Larwan
    Albusays, Khaled
    Seita, Matthew
    Huenerfauth, Matt
    CHI EA '19 EXTENDED ABSTRACTS: EXTENDED ABSTRACTS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,