Introduction To Partial Fine-tuning: A Comprehensive Evaluation Of End-to-end Children's Automatic Speech Recognition Adaptation

被引：0

作者：

Rolland, Thomas ^{[1
,2
]}

Abad, Alberto ^{[1
,2
]}

机构：

[1] INESC ID, Lisbon, Portugal

[2] Univ Lisbon, Inst Super Tecn, Lisbon, Portugal

来源：

INTERSPEECH 2024 | 2024年

关键词：

speech recognition; children speech; transfer learning; over-parameterisation;

D O I：

10.21437/Interspeech.2024-1102

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic Speech Recognition (ASR) encounters unique challenges when dealing with children's speech, mainly due to the scarcity of available data. Training large ASR models with constrained data presents a significant challenge. To address this, fine-tuning strategy is frequently employed. However, fine-tuning an entire large pre-trained model with limited children's speech data may overfit leading to decreased performance. This study offers a granular evaluation of children's ASR fine-tuning, departing from conventional whole-network tunning. We present a partial fine-tuning approach spotlighting the importance of the Encoder and Feedforward Neural Network modules in Transformer-based models. Remarkably, this method surpasses the efficacy of whole-model fine-tuning, with a relative word error rate improvement of 9% when dealing with limited data. Our findings highlight the critical role of partial fine-tuning in advancing children's ASR model development.

引用

页码：5178 / 5182

页数：5

共 50 条

[1] FINE-TUNING OF PRE-TRAINED END-TO-END SPEECH RECOGNITION WITH GENERATIVE ADVERSARIAL NETWORKS
Haidar, Md Akmal
Rezagholizadeh, Mehdi
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6204 - 6208
[2] An Overview of End-to-End Automatic Speech Recognition
Wang, Dong
Wang, Xiaodong
Lv, Shaohe
SYMMETRY-BASEL, 2019, 11 (08):
[3] LWMD: A Comprehensive Compression Platform for End-to-End Automatic Speech Recognition Models
Liu, Yukun
Li, Ta
Zhang, Pengyuan
Yan, Yonghong
APPLIED SCIENCES-BASEL, 2023, 13 (03):
[4] Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation
Wang, Chengyi
Wu, Yu
Liu, Shujie
Yang, Zhenglu
Zhou, Ming
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9161 - 9168
[5] Improving End-to-End Models for Children's Speech Recognition
Patel, Tanvina
Scharenborg, Odette
APPLIED SCIENCES-BASEL, 2024, 14 (06):
[6] A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
Li, Yan
Wang, Yapeng
Hoi, Lap Man
Yang, Dingcheng
Im, Sio-Kei
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2025, 2025 (01):
[7] SPEAKER ADAPTATION FOR MULTICHANNEL END-TO-END SPEECH RECOGNITION
Ochiai, Tsubasa
Watanabe, Shinji
Katagiri, Shigeru
Hori, Takaaki
Hershey, John
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6707 - 6711
[8] Efficient Adaptation of Spoken Language Understanding based on End-to-End Automatic Speech Recognition
Kim, Eesung
Jajodia, Aditya
Tseng, Cindy
Neelagiri, Divya
Ki, Taeyeon
Apsingekar, Vijendra Raj
INTERSPEECH 2023, 2023, : 3959 - 3963
[9] INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
Fu, Li
Li, Xiaoxiao
Zi, Libo
Zhang, Zhengchen
Wu, Youzheng
He, Xiaodong
Zhou, Bowen
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 320 - 327
[10] Recent Advances in End-to-End Automatic Speech Recognition
Li, Jinyu
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2022, 11 (01)

← 1 2 3 4 5 →