Category-based and Target-based Data Augmentation for Dysarthric Speech Recognition Using Transfer Learning

被引：0

作者：

Nawroly, Sarkhell Sirwan ^{[1
]}

Popescu, Decebal ^{[1
]}

Antony, Mariya Celin T. H. E. K. E. K. A. R. A. ^{[2
]}

机构：

[1] Natl Univ Sci & Technol POLITEHN Bucharest, Fac Automat Control & Comp Sci, 313 Splaiul Independentei, Bucharest 060042, Romania

[2] Sai Univ, Sch Comp & Data Sci, Paiyanur 603104, Tamil Nadu, India

来源：

STUDIES IN INFORMATICS AND CONTROL | 2024年 / 33卷 / 04期

关键词：

Dysarthric speech recognition; Noise analysis; Transfer learning approach; NOISE;

D O I：

10.24846/v33i4y202408

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Dysarthric speech recognition poses unique challenges in comparison with normal speech recognition systems due to the scarcity of dysarthric speech data. To address this data sparsity issue, researchers have developed data augmentation techniques. These techniques utilize either the original dysarthric speech examples or speech data pertaining to normal speakers to generate new dysarthric speech data, thereby improving the dysarthric speech recognition performance. This study uses dysarthric speech examples to create augmented examples for training purposes in order to retain the identity of the dysarthric speakers in terms of their speech errors. A two-stage transfer learning strategy is employed, in the first stage of which a category-specific low-frequency noise augmentation method is introduced, while in its second stage a dysarthric speaker-specific data augmentation approach is implemented. The proposed method blends the advantages of various data augmentation approaches in the literature to develop a fine two-stage model that can handle data augmentation without compromising on the quality of the target model. This two-stage approach achieved a notable Word Error Rate (WER) reduction of approximately 11.369%, especially among the severely affected dysarthric speakers, by contrast to the transfer learning method that relies only on normal speech-related data for training.

引用

页数：130

共 50 条

[21] Reinforcement Learning based Data Augmentation for Noise Robust Speech Emotion Recognition
Ranjan, Sumit
Chakraborty, Rupayan
Kopparapu, Sunil Kumar
INTERSPEECH 2024, 2024, : 1040 - 1044
[22] SENet-based speech emotion recognition using synthesis-style transfer data augmentation
Rajan R.
Hridya Raj T.V.
International Journal of Speech Technology, 2023, 26 (04) : 1017 - 1030
[23] Learning about social category-based obligations
Chalik, Lisa
Rhodes, Marjorie
COGNITIVE DEVELOPMENT, 2018, 48 : 117 - 124
[24] Improving Diacritical Arabic Speech Recognition: Transformer-Based Models with Transfer Learning and Hybrid Data Augmentation
Alaqel, Haifa
El Hindi, Khalil
Information (Switzerland), 2025, 16 (03)
[25] Improving Recognition of Dysarthric Speech Using Severity Based Tempo Adaptation
Bhat, Chitralekha
Vachhani, Bhavik
Kopparapu, Sunil
Speech and Computer, 2016, 9811 : 370 - 377
[26] Deep Learning-Based Acoustic Feature Representations for Dysarthric Speech Recognition
Latha M.
Shivakumar M.
Manjula G.
Hemakumar M.
Kumar M.K.
SN Computer Science, 4 (3)
[27] Android Malware Detection Using Category-Based Machine Learning Classifiers
Alatwi, Huda Ali
Oh, Tae
Fokoue, Ernest
Stackpole, Bill
SIGITE'16: PROCEEDINGS OF THE 17TH ANNUAL CONFERENCE ON INFORMATION TECHNOLOGY EDUCATION, 2016, : 54 - 59
[28] Data augmentation method for underwater acoustic target recognition based on underwater acoustic channel modeling and transfer learning
Li, Daihui
Liu, Feng
Shen, Tongsheng
Chen, Liang
Zhao, Dexin
APPLIED ACOUSTICS, 2023, 208
[29] CycleGAN-based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition
Bao, Fang
Neumann, Michael
Ngoc Thang Vu
INTERSPEECH 2019, 2019, : 2828 - 2832
[30] Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation
Baek, Ji-Young
Lee, Seok-Pil
Tsihrintzis, George A.
ELECTRONICS, 2023, 12 (18)

← 1 2 3 4 5 →