Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

被引:12
|
作者
Korzekwa, Daniel [1 ]
Barra-Chicote, Roberto [1 ]
Kostek, Bozena [2 ]
Drugman, Thomas [1 ]
Lajszczak, Mateusz [1 ]
机构
[1] Amazon TTS Res, Cambridge, England
[2] Gdansk Univ Technol, Fac ETI, Gdansk, Poland
来源
关键词
dysarthria detection; speech recognition; speech synthesis; interpretable deep learning models;
D O I
10.21437/Interspeech.2019-1206
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not provide interpretable outputs. On the contrary, we show that this latent space successfully encodes interpretable characteristics of dysarthria, is effective at detecting dysarthria, and that manipulation of the latent space allows the model to reconstruct healthy speech from dysarthric speech. This work can help patients and speech pathologists to improve their understanding of the condition, lead to more accurate diagnoses and aid in reconstructing healthy speech for afflicted patients.
引用
收藏
页码:3890 / 3894
页数:5
相关论文
共 50 条
  • [31] Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech
    G. Diwakar
    Veena Karjigi
    Circuits, Systems, and Signal Processing, 2020, 39 : 5543 - 5567
  • [32] Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech
    Diwakar, G.
    Karjigi, Veena
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (11) : 5543 - 5567
  • [33] Interpretable Deep Learning Applied to Rip Current Detection and Localization
    Rampal, Neelesh
    Shand, Tom
    Wooler, Adam
    Rautenbach, Christo
    REMOTE SENSING, 2022, 14 (23)
  • [34] Interpretable Detection of Partial Discharge in Power Lines with Deep Learning
    Michau, Gabriel
    Hsu, Chi-Ching
    Fink, Olga
    SENSORS, 2021, 21 (06) : 1 - 14
  • [35] An enhanced interpretable deep learning approach for diabetic retinopathy detection
    Alrajjou, Soha
    Boahen, Edward Kwadwo
    Menga, Chunyun
    Cheng, Keyang
    2022 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, CYBERC, 2022, : 127 - 135
  • [36] Deep Multi-task Learning for Interpretable Glaucoma Detection
    Mojab, Nooshin
    Noroozi, Vahid
    Yu, Philip S.
    Hallak, Joelle A.
    2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 167 - 174
  • [37] Perceptual Learning of Dysarthric Speech: A Review of Experimental Studies
    Borrie, Stephanie A.
    McAuliffe, Megan J.
    Liss, Julie M.
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2012, 55 (01): : 290 - 305
  • [38] AUTOMATIC DETECTION OF VOICE ONSET TIME IN DYSARTHRIC SPEECH
    Novotny, Michal
    Pospisil, Jakub
    Cmejla, Roman
    Rusz, Jan
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4340 - 4344
  • [39] Deep neural network architectures for dysarthric speech analysis and recognition
    Brahim Fares Zaidi
    Sid Ahmed Selouani
    Malika Boudraa
    Mohammed Sidi Yakoub
    Neural Computing and Applications, 2021, 33 : 9089 - 9108
  • [40] Deep neural network architectures for dysarthric speech analysis and recognition
    Zaidi, Brahim Fares
    Selouani, Sid Ahmed
    Boudraa, Malika
    Sidi Yakoub, Mohammed
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (15): : 9089 - 9108