Exploring the Impact of Fine-Tuning the Wav2vec2 Model in Database-Independent Detection of Dysarthric Speech

被引：3

作者：

Javanmardi, Farhad ^{[1
]}

Kadiri, Sudarsana Reddy ^{[1
,2
]}

Alku, Paavo ^{[1
]}

机构：

[1] Aalto Univ, Dept Informat & Commun Engn, FI-00076 Espoo, Finland

[2] Univ Southern Calif, Signal Anal & Interpretat Lab SAIL, Los Angeles, CA 90089 USA

来源：

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS | 2024年 / 28卷 / 08期

基金：

芬兰科学院;

关键词：

Dysarthria; fine-tuning; self-supervised learning; wav2vec; 2.0; INTELLIGIBILITY; CLASSIFICATION; SPEAKERS;

D O I：

10.1109/JBHI.2024.3392829

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many acoustic features and machine learning models have been studied to build automatic detection systems to distinguish dysarthric speech from healthy speech. These systems can help to improve the reliability of diagnosis. However, speech recorded for diagnosis in real-life clinical conditions can differ from the training data of the detection system in terms of, for example, recording conditions, speaker identity, and language. These mismatches may lead to a reduction in detection performance in practical applications. In this study, we investigate the use of the wav2vec2 model as a feature extractor together with a support vector machine (SVM) classifier to build automatic detection systems for dysarthric speech. The performance of the wav2vec2 features is evaluated in two cross-database scenarios, language-dependent and language-independent, to study their generalizability to unseen speakers, recording conditions, and languages before and after fine-tuning the wav2vec2 model. The results revealed that the fine-tuned wav2vec2 features showed better generalization in both scenarios and gave an absolute accuracy improvement of 1.46%-8.65% compared to the non-fine-tuned wav2vec2 features.

引用

页码：4951 / 4962

页数：12

共 48 条

[1] FINE-TUNING WAV2VEC2 FOR SPEAKER RECOGNITION
Vaessen, Nik
Van Leeuwen, David A.
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7967 - 7971
[2] Investigating wav2vec2 context representations and the effects of fine-tuning, a case-study of a Finnish model
Grosz, Tamas
Getman, Yaroslav
Al-Ghezi, Ragheb
Rouhe, Aku
Kurimo, Mikko
INTERSPEECH 2023, 2023, : 196 - 200
[3] Speaker adaptation for Wav2vec2 based dysarthric ASR
Baskar, Murali Karthick
Herzig, Tim
Nguyen, Diana
Diez, Mireia
Polzehl, Tim
Burget, Lukas
Cernocky, Jan Honza
INTERSPEECH 2022, 2022, : 3403 - 3407
[4] Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction
Becerra, Helard
Ragano, Alessandro
Hines, Andrew
INTERSPEECH 2022, 2022, : 4088 - 4092
[5] Combining wav2vec 2.0 Fine-Tuning and ConLearnNet for Speech Emotion Recognition
Sun, Chenjing
Zhou, Yi
Huang, Xin
Yang, Jichen
Hou, Xianhua
ELECTRONICS, 2024, 13 (06)
[6] A study on fine-tuning wav2vec2.0 Model for the task of Mispronunciation Detection and Diagnosis
Peng, Linkai
Fu, Kaiqi
Lin, Binghuai
Ke, Dengfeng
Zhan, Jinsong
INTERSPEECH 2021, 2021, : 4448 - 4452
[7] Effect of Speech Modification on Wav2Vec2 Models for Children Speech Recognition
Sinha, Abhijit
Singh, Mittul
Kadiri, Sudarsana Reddy
Kurimo, Mikko
Kathania, Hemant Kumar
2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,
[8] Unveiling embedded features in Wav2vec2 and HuBERT msodels for Speech Emotion Recognition
Chakhtouna, Adil
Sekkate, Sara
Adib, Abdellah
5TH INTERNATIONAL CONFERENCE ON INDUSTRY 4.0 AND SMART MANUFACTURING, ISM 2023, 2024, 232 : 2560 - 2569
[9] Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings
Kodali, Manila
Kadiri, Sudarsana Reddy
Alku, Paavo
INTERSPEECH 2023, 2023, : 4134 - 4138
[10] A CLOSER LOOK AT WAV2VEC2 EMBEDDINGS FOR ON-DEVICE SINGLE-CHANNEL SPEECH ENHANCEMENT
Shankar, Ravi
Tan, Ke
Xu, Buye
Kumar, Anurag
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 751 - 755

← 1 2 3 4 5 →