Exploring Native and Non-Native English Child Speech Recognition With Whisper

被引：0

作者：

Jain, Rishabh ^{[1
]}

Barcovschi, Andrei ^{[1
]}

Yiwere, Mariam Yahayah ^{[1
]}

Corcoran, Peter ^{[1
]}

Cucu, Horia ^{[2
]}

机构：

[1] Univ Galway, Sch Elect & Elect Engn, Galway H91 TK33, Ireland

[2] Univ Politehn Bucuresti, Speech & Dialogue Res Lab, Bucharest 060042, Romania

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Child automatic speech recognition; whisper; large-scale supervision; MyST; PFSTAR; CMU_Kids; speechocean762; non-native child speech;

D O I：

10.1109/ACCESS.2024.3378738

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modern end-to-end Automatic Speech Recognition (ASR) systems struggle to recognise children's speech. This challenge is due to the high acoustic variability in children's voices and the scarcity of child speech training data, particularly for accented or low-resource languages. This study focuses on improving the performance of ASR on native and non-native English child speech using publicly available datasets. We evaluate how the large-scale whisper models (trained with a large amount of adult speech data) perform with child speech. In addition, we perform finetuning experiments using different child speech datasets to investigate the performance of whisper ASR on non-native English-speaking children's speech. Our findings indicate relative Word Error Rate (WER) improvements ranging from 29% to 89% over previous benchmarks on the same datasets. Notably, these gains were achieved by finetuning with only a 10% sample of unseen non-native datasets. These results demonstrate the potential of whisper for improving ASR in a low-resource scenario for non-native child speech.

引用

页码：41601 / 41610

页数：10

共 50 条

[31] Native and non-native segmentation of continuous speech
Hanulikova, Adriana
Mitterer, Holger
McQueen, M. James
INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2008, 43 (3-4) : 675 - 675
[32] Multilingual Weighted Codebooks for Non-native Speech Recognition
Raab, Martin
Gruhn, Rainer
Noeth, Elmar
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 485 - +
[33] Dual supervised learning for non-native speech recognition
Kacper Radzikowski
Robert Nowak
Le Wang
Osamu Yoshie
EURASIP Journal on Audio, Speech, and Music Processing, 2019
[34] Optimizing non-native speech recognition for CALL applications
van Doremalen, Joost
Strik, Helmer
Cucchiarini, Catia
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 588 - 591
[35] Comprehension in native and non-native English speakers
Schubert, Teresa
NATURE REVIEWS PSYCHOLOGY, 2024, 3 (12): : 785 - 785
[36] NATIVE LANGUAGE IMPACT ON THE REALIZATION OF SPEECH ACT OF COMPLIMENTING BY NON-NATIVE ENGLISH SPEAKERS
Markovic, Suzana S.
NASLEDE, 2022, 19 (51): : 97 - 114
[37] Stereotypes of Cantonese English, apparent native/non-native status, and their effect on non-native English speakers' perception
Hu, Guiling
Lindemann, Stephanie
JOURNAL OF MULTILINGUAL AND MULTICULTURAL DEVELOPMENT, 2009, 30 (03) : 253 - 269
[38] Predicting Word Accuracy for the Automatic Speech Recognition of Non-Native Speech
Yoon, Su-Youn
Chen, Lei
Zechner, Klaus
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 773 - 776
[39] Perception of non-native speech
Baese-Berk, Melissa M.
McLaughlin, Drew J.
McGowan, Kevin B.
LANGUAGE AND LINGUISTICS COMPASS, 2020, 14 (07):
[40] Perceiving non-native speech
Bürki-Cohen, J
Miller, JL
Eimas, PD
LANGUAGE AND SPEECH, 2001, 44 : 149 - 169

← 1 2 3 4 5 →