Automatic speaker verification system for dysarthric speakers using prosodic features and out-of-domain data augmentation

被引:3
|
作者
Salim, Shinimol [1 ]
Shahnawazuddin, Syed [2 ]
Ahmad, Waquar [1 ]
机构
[1] Natl Inst Technol, Elect & Commun Dept, Calicut 673601, India
[2] Natl Inst Technol, Elect & Commun Dept, Patna 800005, India
关键词
Automatic speaker verification system; Dysarthria; Duration modification based data augmentation; MFCC; Prosody; i-vector; x-vector; SPEECH; LOUDNESS;
D O I
10.1016/j.apacoust.2023.109412
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A communication disorder is an impairment of a person's ability to talk or communicate appropriately. Dysarthria is a common neuro-motor speech communication disorder that can be caused by neurological damage. Dysarthria may affect the articulation, phonation, and prosody of a speaker. Dysarthria patients have poor neuromotor coordination and other physical impairments, making it difficult to utilize an interactive keyboard or other user interfaces. The ASV system can make biometric applications more accessible to dysarthric speakers by eliminating the need for them to remember cumbersome and unique authentication numbers and passwords. In this paper, we presented a study on developing an automatic speaker verification (ASV) system for dysarthria patients with varying speech intelligibility to assist them in remote access control and voice-based biometric applications. In the initial part of our proposed approach, we included a duration modification-based data augmentation module in the front end of the ASV system. Since prosody deficits are one of the early indicators of dysarthria, we investigated the role of prosodic variables in combination with the traditional Mel-frequency cepstral coefficients (MFCC). The prosodic variables explored in this study include pitch, loudness, and voicing probability. Separate i-vector and x-vector models are trained and compared using individual MFCC, prosodic vari-ables, and their combinations. The experimental results showed that the proposed approach based on combining MFCC and prosody features along with duration-modification-based data augmentation pro-duced promising results. & COPY; 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] IMPROVING OUT-DOMAIN PLDA SPEAKER VERIFICATION USING UNSUPERVISED INTER-DATASET VARIABILITY COMPENSATION APPROACH
    Kanagasundaram, Ahilan
    Dean, David
    Sridharan, Sridha
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4654 - 4658
  • [42] Selecting Augmentation Methods for Domain Generalization and Out-of-Distribution Detection Using Unlabeled Data
    Kucuktas, Ulku Tuncer
    Uysal, Fatih
    Hardalac, Firat
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [43] Replay spoofing detection system for automatic speaker verification using multi-task learning of noise classes
    Shim, Hye-Jin
    Jung, Jee-Weon
    Heo, Hee-Soo
    Yoon, Sung-Hyun
    Yu, Ha-Jin
    2018 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2018, : 172 - 176
  • [44] An automatic speech recognition system in Odia language using attention mechanism and data augmentation
    Malay Kumar Majhi
    Sujan Kumar Saha
    International Journal of Speech Technology, 2024, 27 (3) : 717 - 728
  • [45] Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features
    Bharath, K. P.
    Kumar, M. Rajesh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (27) : 39343 - 39366
  • [46] Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features
    K. P. Bharath
    M. Rajesh Kumar
    Multimedia Tools and Applications, 2022, 81 : 39343 - 39366
  • [47] Studying the Effectiveness of Data Augmentation and Frequency-Domain Linear Prediction Coefficients in Children's Speaker Verification Under Low-Resource Conditions
    Aziz, Shahid
    Pushp, Shivesh
    Shahnawazuddin, Syed
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 395 - 406
  • [48] Feature Extraction Approach for Speaker Verification to Support Healthcare System Using Blockchain Security for Data Privacy
    Upadhyay, Shrikant
    Kumar, Mohit
    Kumar, Ashwani
    Karnati, Ramesh
    Mahommad, Gouse Baig
    Althubiti, Sara A.
    Alenezi, Fayadh
    Polat, Kemal
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
  • [49] Cross-Language Forensic Voice Comparison of Hong Kong Trilingual Speakers using Filled Pauses and an Automatic Speaker Recognition System
    Cao, Grace Wenling
    Hughes, Vincent
    Wang, Bruce
    Mok, Peggy
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 279 - 283
  • [50] A Domain-Independent Automatic Labeling System for Large-Scale Social Data Annotation Using Lexicon and Web-Based Augmentation
    Khatoon, Shaheen
    Abu Romman, Lamis
    Hasan, Md Maruf
    INFORMATION TECHNOLOGY AND CONTROL, 2020, 49 (01): : 36 - 54