Automatic speaker verification system for dysarthric speakers using prosodic features and out-of-domain data augmentation

被引:3
|
作者
Salim, Shinimol [1 ]
Shahnawazuddin, Syed [2 ]
Ahmad, Waquar [1 ]
机构
[1] Natl Inst Technol, Elect & Commun Dept, Calicut 673601, India
[2] Natl Inst Technol, Elect & Commun Dept, Patna 800005, India
关键词
Automatic speaker verification system; Dysarthria; Duration modification based data augmentation; MFCC; Prosody; i-vector; x-vector; SPEECH; LOUDNESS;
D O I
10.1016/j.apacoust.2023.109412
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A communication disorder is an impairment of a person's ability to talk or communicate appropriately. Dysarthria is a common neuro-motor speech communication disorder that can be caused by neurological damage. Dysarthria may affect the articulation, phonation, and prosody of a speaker. Dysarthria patients have poor neuromotor coordination and other physical impairments, making it difficult to utilize an interactive keyboard or other user interfaces. The ASV system can make biometric applications more accessible to dysarthric speakers by eliminating the need for them to remember cumbersome and unique authentication numbers and passwords. In this paper, we presented a study on developing an automatic speaker verification (ASV) system for dysarthria patients with varying speech intelligibility to assist them in remote access control and voice-based biometric applications. In the initial part of our proposed approach, we included a duration modification-based data augmentation module in the front end of the ASV system. Since prosody deficits are one of the early indicators of dysarthria, we investigated the role of prosodic variables in combination with the traditional Mel-frequency cepstral coefficients (MFCC). The prosodic variables explored in this study include pitch, loudness, and voicing probability. Separate i-vector and x-vector models are trained and compared using individual MFCC, prosodic vari-ables, and their combinations. The experimental results showed that the proposed approach based on combining MFCC and prosody features along with duration-modification-based data augmentation pro-duced promising results. & COPY; 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] IN-DOMAIN AND OUT-OF-DOMAIN DATA AUGMENTATION TO IMPROVE CHILDREN'S SPEAKER VERIFICATION SYSTEM IN LIMITED DATA SCENARIO
    Shahnawazuddin, S.
    Ahmad, Waquar
    Adiga, Nagaraj
    Kumar, Avinash
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7554 - 7558
  • [2] Improving Speaker Verification Performance in Presence of Spoofing Attacks Using Out-of-Domain Spoofed Data
    Sarkar, Achintya Kr.
    Sahidullah, Md.
    Tan, Zheng-Hua
    Kinnunen, Tomi
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2611 - 2615
  • [3] Combined approach to dysarthric speaker verification using data augmentation and feature fusion
    Salim, Shinimol
    Shahnawazuddin, Syed
    Ahmad, Waquar
    SPEECH COMMUNICATION, 2024, 160
  • [4] CONTEXTUAL OUT-OF-DOMAIN UTTERANCE HANDLING WITH COUNTERFEIT DATA AUGMENTATION
    Lee, Sungjin
    Shalyminov, Igor
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7205 - 7209
  • [5] Automatic speaker recognition using LVQ with 3 prosodic features
    Ouamour-Sayoud, S
    Sayoud, H
    INTELLIGENT AND ADAPTIVE SYSTEMS AND SOFTWARE ENGINEERING, 2004, : 95 - 99
  • [6] Autoencoder-based Semi-Supervised Curriculum Learning For Out-of-domain Speaker Verification
    Zheng, Siqi
    Liu, Gang
    Suo, Hongbin
    Lei, Yun
    INTERSPEECH 2019, 2019, : 4360 - 4364
  • [7] An Automatic Diagnosis and Assessment of Dysarthric Speech using Speech Disorder Specific Prosodic Features
    Vyas, Garima
    Dutta, Malay Kishore
    Prinosil, Jiri
    Harar, Pavol
    2016 39TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2016, : 515 - 518
  • [8] Improving Children's Speech Recognition through Out-of-Domain Data Augmentation
    Fainberg, Joachim
    Bell, Peter
    Lincoln, Mike
    Renals, Steve
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1598 - 1602
  • [9] Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech
    Christensen, H.
    Aniol, M. B.
    Bell, P.
    Green, P.
    Hain, T.
    King, S.
    Swietojanski, P.
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3609 - 3612
  • [10] Using out-of-domain data to improve on-domain language models
    Iyer, R
    Ostendorf, M
    Gish, H
    IEEE SIGNAL PROCESSING LETTERS, 1997, 4 (08) : 221 - 223