Exploring the Impact of Fine-Tuning the Wav2vec2 Model in Database-Independent Detection of Dysarthric Speech

被引:3
|
作者
Javanmardi, Farhad [1 ]
Kadiri, Sudarsana Reddy [1 ,2 ]
Alku, Paavo [1 ]
机构
[1] Aalto Univ, Dept Informat & Commun Engn, FI-00076 Espoo, Finland
[2] Univ Southern Calif, Signal Anal & Interpretat Lab SAIL, Los Angeles, CA 90089 USA
基金
芬兰科学院;
关键词
Dysarthria; fine-tuning; self-supervised learning; wav2vec; 2.0; INTELLIGIBILITY; CLASSIFICATION; SPEAKERS;
D O I
10.1109/JBHI.2024.3392829
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many acoustic features and machine learning models have been studied to build automatic detection systems to distinguish dysarthric speech from healthy speech. These systems can help to improve the reliability of diagnosis. However, speech recorded for diagnosis in real-life clinical conditions can differ from the training data of the detection system in terms of, for example, recording conditions, speaker identity, and language. These mismatches may lead to a reduction in detection performance in practical applications. In this study, we investigate the use of the wav2vec2 model as a feature extractor together with a support vector machine (SVM) classifier to build automatic detection systems for dysarthric speech. The performance of the wav2vec2 features is evaluated in two cross-database scenarios, language-dependent and language-independent, to study their generalizability to unseen speakers, recording conditions, and languages before and after fine-tuning the wav2vec2 model. The results revealed that the fine-tuned wav2vec2 features showed better generalization in both scenarios and gave an absolute accuracy improvement of 1.46%-8.65% compared to the non-fine-tuned wav2vec2 features.
引用
收藏
页码:4951 / 4962
页数:12
相关论文
共 48 条
  • [1] FINE-TUNING WAV2VEC2 FOR SPEAKER RECOGNITION
    Vaessen, Nik
    Van Leeuwen, David A.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7967 - 7971
  • [2] Investigating wav2vec2 context representations and the effects of fine-tuning, a case-study of a Finnish model
    Grosz, Tamas
    Getman, Yaroslav
    Al-Ghezi, Ragheb
    Rouhe, Aku
    Kurimo, Mikko
    INTERSPEECH 2023, 2023, : 196 - 200
  • [3] Speaker adaptation for Wav2vec2 based dysarthric ASR
    Baskar, Murali Karthick
    Herzig, Tim
    Nguyen, Diana
    Diez, Mireia
    Polzehl, Tim
    Burget, Lukas
    Cernocky, Jan Honza
    INTERSPEECH 2022, 2022, : 3403 - 3407
  • [4] Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction
    Becerra, Helard
    Ragano, Alessandro
    Hines, Andrew
    INTERSPEECH 2022, 2022, : 4088 - 4092
  • [5] Combining wav2vec 2.0 Fine-Tuning and ConLearnNet for Speech Emotion Recognition
    Sun, Chenjing
    Zhou, Yi
    Huang, Xin
    Yang, Jichen
    Hou, Xianhua
    ELECTRONICS, 2024, 13 (06)
  • [6] A study on fine-tuning wav2vec2.0 Model for the task of Mispronunciation Detection and Diagnosis
    Peng, Linkai
    Fu, Kaiqi
    Lin, Binghuai
    Ke, Dengfeng
    Zhan, Jinsong
    INTERSPEECH 2021, 2021, : 4448 - 4452
  • [7] Effect of Speech Modification on Wav2Vec2 Models for Children Speech Recognition
    Sinha, Abhijit
    Singh, Mittul
    Kadiri, Sudarsana Reddy
    Kurimo, Mikko
    Kathania, Hemant Kumar
    2024 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM 2024, 2024,
  • [8] Unveiling embedded features in Wav2vec2 and HuBERT msodels for Speech Emotion Recognition
    Chakhtouna, Adil
    Sekkate, Sara
    Adib, Abdellah
    5TH INTERNATIONAL CONFERENCE ON INDUSTRY 4.0 AND SMART MANUFACTURING, ISM 2023, 2024, 232 : 2560 - 2569
  • [9] Classification of Vocal Intensity Category from Speech using the Wav2vec2 and Whisper Embeddings
    Kodali, Manila
    Kadiri, Sudarsana Reddy
    Alku, Paavo
    INTERSPEECH 2023, 2023, : 4134 - 4138
  • [10] A CLOSER LOOK AT WAV2VEC2 EMBEDDINGS FOR ON-DEVICE SINGLE-CHANNEL SPEECH ENHANCEMENT
    Shankar, Ravi
    Tan, Ke
    Xu, Buye
    Kumar, Anurag
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 751 - 755