Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

被引：12

作者：

Korzekwa, Daniel ^{[1
]}

Barra-Chicote, Roberto ^{[1
]}

Kostek, Bozena ^{[2
]}

Drugman, Thomas ^{[1
]}

Lajszczak, Mateusz ^{[1
]}

机构：

[1] Amazon TTS Res, Cambridge, England

[2] Gdansk Univ Technol, Fac ETI, Gdansk, Poland

来源：

INTERSPEECH 2019 | 2019年

关键词：

dysarthria detection; speech recognition; speech synthesis; interpretable deep learning models;

D O I：

10.21437/Interspeech.2019-1206

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

We present a novel deep learning model for the detection and reconstruction of dysarthric speech. We train the model with a multi-task learning technique to jointly solve dysarthria detection and speech reconstruction tasks. The model key feature is a low-dimensional latent space that is meant to encode the properties of dysarthric speech. It is commonly believed that neural networks are black boxes that solve problems but do not provide interpretable outputs. On the contrary, we show that this latent space successfully encodes interpretable characteristics of dysarthria, is effective at detecting dysarthria, and that manipulation of the latent space allows the model to reconstruct healthy speech from dysarthric speech. This work can help patients and speech pathologists to improve their understanding of the condition, lead to more accurate diagnoses and aid in reconstructing healthy speech for afflicted patients.

引用

页码：3890 / 3894

页数：5

共 50 条

[31] Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech
G. Diwakar
Veena Karjigi
Circuits, Systems, and Signal Processing, 2020, 39 : 5543 - 5567
[32] Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech
Diwakar, G.
Karjigi, Veena
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (11) : 5543 - 5567
[33] Interpretable Deep Learning Applied to Rip Current Detection and Localization
Rampal, Neelesh
Shand, Tom
Wooler, Adam
Rautenbach, Christo
REMOTE SENSING, 2022, 14 (23)
[34] Interpretable Detection of Partial Discharge in Power Lines with Deep Learning
Michau, Gabriel
Hsu, Chi-Ching
Fink, Olga
SENSORS, 2021, 21 (06) : 1 - 14
[35] An enhanced interpretable deep learning approach for diabetic retinopathy detection
Alrajjou, Soha
Boahen, Edward Kwadwo
Menga, Chunyun
Cheng, Keyang
2022 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY, CYBERC, 2022, : 127 - 135
[36] Deep Multi-task Learning for Interpretable Glaucoma Detection
Mojab, Nooshin
Noroozi, Vahid
Yu, Philip S.
Hallak, Joelle A.
2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 167 - 174
[37] Perceptual Learning of Dysarthric Speech: A Review of Experimental Studies
Borrie, Stephanie A.
McAuliffe, Megan J.
Liss, Julie M.
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2012, 55 (01): : 290 - 305
[38] AUTOMATIC DETECTION OF VOICE ONSET TIME IN DYSARTHRIC SPEECH
Novotny, Michal
Pospisil, Jakub
Cmejla, Roman
Rusz, Jan
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4340 - 4344
[39] Deep neural network architectures for dysarthric speech analysis and recognition
Brahim Fares Zaidi
Sid Ahmed Selouani
Malika Boudraa
Mohammed Sidi Yakoub
Neural Computing and Applications, 2021, 33 : 9089 - 9108
[40] Deep neural network architectures for dysarthric speech analysis and recognition
Zaidi, Brahim Fares
Selouani, Sid Ahmed
Boudraa, Malika
Sidi Yakoub, Mohammed
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (15): : 9089 - 9108

← 1 2 3 4 5 →