Automatic speaker verification system for dysarthric speakers using prosodic features and out-of-domain data augmentation

被引:3
|
作者
Salim, Shinimol [1 ]
Shahnawazuddin, Syed [2 ]
Ahmad, Waquar [1 ]
机构
[1] Natl Inst Technol, Elect & Commun Dept, Calicut 673601, India
[2] Natl Inst Technol, Elect & Commun Dept, Patna 800005, India
关键词
Automatic speaker verification system; Dysarthria; Duration modification based data augmentation; MFCC; Prosody; i-vector; x-vector; SPEECH; LOUDNESS;
D O I
10.1016/j.apacoust.2023.109412
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A communication disorder is an impairment of a person's ability to talk or communicate appropriately. Dysarthria is a common neuro-motor speech communication disorder that can be caused by neurological damage. Dysarthria may affect the articulation, phonation, and prosody of a speaker. Dysarthria patients have poor neuromotor coordination and other physical impairments, making it difficult to utilize an interactive keyboard or other user interfaces. The ASV system can make biometric applications more accessible to dysarthric speakers by eliminating the need for them to remember cumbersome and unique authentication numbers and passwords. In this paper, we presented a study on developing an automatic speaker verification (ASV) system for dysarthria patients with varying speech intelligibility to assist them in remote access control and voice-based biometric applications. In the initial part of our proposed approach, we included a duration modification-based data augmentation module in the front end of the ASV system. Since prosody deficits are one of the early indicators of dysarthria, we investigated the role of prosodic variables in combination with the traditional Mel-frequency cepstral coefficients (MFCC). The prosodic variables explored in this study include pitch, loudness, and voicing probability. Separate i-vector and x-vector models are trained and compared using individual MFCC, prosodic vari-ables, and their combinations. The experimental results showed that the proposed approach based on combining MFCC and prosody features along with duration-modification-based data augmentation pro-duced promising results. & COPY; 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] TRANSCRIPTION OF MULTI-GENRE MEDIA ARCHIVES USING OUT-OF-DOMAIN DATA
    Bell, P. J.
    Gales, M. J. F.
    Lanchantin, P.
    Liu, X.
    Long, Y.
    Renals, S.
    Swietojanski, P.
    Woodland, P. C.
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 324 - 329
  • [22] System Source and Dynamic Features for Speaker Verification for Limited Data Condition
    Kumari, T. R. Jayanthi
    Jayanna, H. S.
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 1458 - 1461
  • [23] Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion
    Cai, Weicheng
    Cai, Danwei
    Liu, Wenbo
    Li, Gang
    Li, Ming
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 17 - 21
  • [24] RAWBOOST: A RAW DATA BOOSTING AND AUGMENTATION METHOD APPLIED TO AUTOMATIC SPEAKER VERIFICATION ANTI-SPOOFING
    Tak, Hemlata
    Kamble, Madhu
    Patino, Jose
    Todisco, Massimiliano
    Evans, Nicholas
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6382 - 6386
  • [25] Spoken English Assessment System for Non-Native Speakers Using Acoustic and Prosodic Features
    Shi, Qin
    Li, Kun
    Zhang, ShiLei
    Chu, Stephen M.
    Xiao, Ji
    Ou, ZhiJian
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1874 - +
  • [26] Using combined features to improve speaker verification in the face of limited reverberant data
    Al-Karawi K.A.
    Mohammed D.Y.
    International Journal of Speech Technology, 2023, 26 (03) : 789 - 799
  • [27] Robust Automatic Speaker Identification System Using Shuffled MFCC Features
    Barhoush, Mahdi
    Hallawa, Ahmed
    Schmeink, Anke
    2021 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLIED NETWORK TECHNOLOGIES (ICMLANT II), 2021, : 28 - 33
  • [28] Speaker-specific-text based speaker verification system using spectral and phase based features
    Bharathi B.
    Bharathi, B. (bharathib@ssn.edu.in), 2017, Springer Science and Business Media, LLC (20) : 465 - 474
  • [29] Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features
    Yu, Hong
    Tan, Zheng-Hua
    Ma, Zhanyu
    Martin, Rainer
    Guo, Jun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) : 4633 - 4644
  • [30] Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification
    Afshan, Amber
    Guo, Jinxi
    Park, Soo Jin
    Ravi, Vijay
    McCree, Alan
    Alwan, Abeer
    INTERSPEECH 2020, 2020, : 4318 - 4322