Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers

被引:0
|
作者
Santiago Omar Caballero Morales
Stephen J. Cox
机构
[1] University of East Anglia,Speech, Language, and Music Group, School of Computing Sciences
关键词
Recognition Accuracy; Confusion Matrix; Automatic Speech Recognition; Acoustic Model; Speech Disorder;
D O I
暂无
中图分类号
学科分类号
摘要
Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1) a set of "metamodels" that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2) a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.
引用
收藏
相关论文
共 50 条
  • [41] SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DYSARTHRIC SPEECH RECOGNITION
    Soleymanpour, Mohammad
    Johnson, Michael T.
    Soleymanpour, Rahim
    Berry, Jeffrey
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7382 - 7386
  • [42] PHONETIC ANALYSIS OF DYSARTHRIC SPEECH TEMPO AND APPLICATIONS TO ROBUST PERSONALISED DYSARTHRIC SPEECH RECOGNITION
    Xiong, Feifei
    Barker, Jon
    Christensen, Heidi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5836 - 5840
  • [43] Speech assistive technology to improve the interaction of dysarthric speakers with machines
    Yakcoub, Mohammed Sidi
    Selouani, Sid-Ahmed
    O'Shaughnessy, Douglas
    2008 3RD INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING, VOLS 1-3, 2008, : 1150 - +
  • [44] SPEECH-MUSCLE VISUOMOTOR TRACKING IN DYSARTHRIC AND NONIMPAIRED SPEAKERS
    MCCLEAN, MD
    BEUKELMAN, DR
    YORKSTON, KM
    JOURNAL OF SPEECH AND HEARING RESEARCH, 1987, 30 (02): : 276 - 282
  • [45] Assessing Automatic Speech Recognition in measuring speech intelligibility: A study of Malay speakers with speech impairments
    Rosdi, Fadhilah
    Mustafa, Mumtaz Begum
    Salim, Siti Salwah
    PROCEEDINGS OF THE 2017 6TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI'17), 2017,
  • [46] Use of Speech Impairment Severity for Dysarthric Speech Recognition
    Geng, Mengzhe
    Jin, Zengrui
    Wang, Tianzi
    Hu, Shujie
    Deng, Jiajun
    Cui, Mingyu
    Li, Guinan
    Yu, Jianwei
    Xie, Xurong
    Liu, Xunying
    INTERSPEECH 2023, 2023, : 2328 - 2332
  • [47] Automatic Speech Recognition Errors Detection and Correction: A Review
    Errattahi, Rahhal
    El Hannani, Asmaa
    Ouahmane, Hassan
    1ST INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING, 2018, 128 : 32 - 37
  • [48] Considerations on Effective Feedback in Computerized Speech Training for Dysarthric Speakers
    Bakker, Marjoke
    Beijer, Lilian
    Rietveld, Toni
    TELEMEDICINE AND E-HEALTH, 2019, 25 (05) : 351 - 358
  • [49] Automatic speech recognition and training for severely dysarthric users of assistive technology: The STARDUST project
    Parker, M
    Cunningham, S
    Enderby, P
    Hawley, M
    Green, P
    CLINICAL LINGUISTICS & PHONETICS, 2006, 20 (2-3) : 149 - 156
  • [50] Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy
    Kayasith, Prakasith
    Theeramunkong, Thanaruk
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (03) : 460 - 468