Automatic speaker verification system for dysarthric speakers using prosodic features and out-of-domain data augmentation

被引：3

作者：

Salim, Shinimol ^{[1
]}

Shahnawazuddin, Syed ^{[2
]}

Ahmad, Waquar ^{[1
]}

机构：

[1] Natl Inst Technol, Elect & Commun Dept, Calicut 673601, India

[2] Natl Inst Technol, Elect & Commun Dept, Patna 800005, India

来源：

APPLIED ACOUSTICS | 2023年 / 210卷

关键词：

Automatic speaker verification system; Dysarthria; Duration modification based data augmentation; MFCC; Prosody; i-vector; x-vector; SPEECH; LOUDNESS;

D O I：

10.1016/j.apacoust.2023.109412

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A communication disorder is an impairment of a person's ability to talk or communicate appropriately. Dysarthria is a common neuro-motor speech communication disorder that can be caused by neurological damage. Dysarthria may affect the articulation, phonation, and prosody of a speaker. Dysarthria patients have poor neuromotor coordination and other physical impairments, making it difficult to utilize an interactive keyboard or other user interfaces. The ASV system can make biometric applications more accessible to dysarthric speakers by eliminating the need for them to remember cumbersome and unique authentication numbers and passwords. In this paper, we presented a study on developing an automatic speaker verification (ASV) system for dysarthria patients with varying speech intelligibility to assist them in remote access control and voice-based biometric applications. In the initial part of our proposed approach, we included a duration modification-based data augmentation module in the front end of the ASV system. Since prosody deficits are one of the early indicators of dysarthria, we investigated the role of prosodic variables in combination with the traditional Mel-frequency cepstral coefficients (MFCC). The prosodic variables explored in this study include pitch, loudness, and voicing probability. Separate i-vector and x-vector models are trained and compared using individual MFCC, prosodic vari-ables, and their combinations. The experimental results showed that the proposed approach based on combining MFCC and prosody features along with duration-modification-based data augmentation pro-duced promising results. & COPY; 2023 Elsevier Ltd. All rights reserved.

引用

页数：16

共 50 条

[41] IMPROVING OUT-DOMAIN PLDA SPEAKER VERIFICATION USING UNSUPERVISED INTER-DATASET VARIABILITY COMPENSATION APPROACH
Kanagasundaram, Ahilan
Dean, David
Sridharan, Sridha
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4654 - 4658
[42] Selecting Augmentation Methods for Domain Generalization and Out-of-Distribution Detection Using Unlabeled Data
Kucuktas, Ulku Tuncer
Uysal, Fatih
Hardalac, Firat
32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
[43] Replay spoofing detection system for automatic speaker verification using multi-task learning of noise classes
Shim, Hye-Jin
Jung, Jee-Weon
Heo, Hee-Soo
Yoon, Sung-Hyun
Yu, Ha-Jin
2018 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2018, : 172 - 176
[44] An automatic speech recognition system in Odia language using attention mechanism and data augmentation
Malay Kumar Majhi
Sujan Kumar Saha
International Journal of Speech Technology, 2024, 27 (3) : 717 - 728
[45] Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features
Bharath, K. P.
Kumar, M. Rajesh
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (27) : 39343 - 39366
[46] Replay spoof detection for speaker verification system using magnitude-phase-instantaneous frequency and energy features
K. P. Bharath
M. Rajesh Kumar
Multimedia Tools and Applications, 2022, 81 : 39343 - 39366
[47] Studying the Effectiveness of Data Augmentation and Frequency-Domain Linear Prediction Coefficients in Children's Speaker Verification Under Low-Resource Conditions
Aziz, Shahid
Pushp, Shivesh
Shahnawazuddin, Syed
SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 395 - 406
[48] Feature Extraction Approach for Speaker Verification to Support Healthcare System Using Blockchain Security for Data Privacy
Upadhyay, Shrikant
Kumar, Mohit
Kumar, Ashwani
Karnati, Ramesh
Mahommad, Gouse Baig
Althubiti, Sara A.
Alenezi, Fayadh
Polat, Kemal
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2022, 2022
[49] Cross-Language Forensic Voice Comparison of Hong Kong Trilingual Speakers using Filled Pauses and an Automatic Speaker Recognition System
Cao, Grace Wenling
Hughes, Vincent
Wang, Bruce
Mok, Peggy
2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 279 - 283
[50] A Domain-Independent Automatic Labeling System for Large-Scale Social Data Annotation Using Lexicon and Web-Based Augmentation
Khatoon, Shaheen
Abu Romman, Lamis
Hasan, Md Maruf
INFORMATION TECHNOLOGY AND CONTROL, 2020, 49 (01): : 36 - 54

← 1 2 3 4 5 →