Exploring Native and Non-Native English Child Speech Recognition With Whisper

被引:0
|
作者
Jain, Rishabh [1 ]
Barcovschi, Andrei [1 ]
Yiwere, Mariam Yahayah [1 ]
Corcoran, Peter [1 ]
Cucu, Horia [2 ]
机构
[1] Univ Galway, Sch Elect & Elect Engn, Galway H91 TK33, Ireland
[2] Univ Politehn Bucuresti, Speech & Dialogue Res Lab, Bucharest 060042, Romania
关键词
Child automatic speech recognition; whisper; large-scale supervision; MyST; PFSTAR; CMU_Kids; speechocean762; non-native child speech;
D O I
10.1109/ACCESS.2024.3378738
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern end-to-end Automatic Speech Recognition (ASR) systems struggle to recognise children's speech. This challenge is due to the high acoustic variability in children's voices and the scarcity of child speech training data, particularly for accented or low-resource languages. This study focuses on improving the performance of ASR on native and non-native English child speech using publicly available datasets. We evaluate how the large-scale whisper models (trained with a large amount of adult speech data) perform with child speech. In addition, we perform finetuning experiments using different child speech datasets to investigate the performance of whisper ASR on non-native English-speaking children's speech. Our findings indicate relative Word Error Rate (WER) improvements ranging from 29% to 89% over previous benchmarks on the same datasets. Notably, these gains were achieved by finetuning with only a 10% sample of unseen non-native datasets. These results demonstrate the potential of whisper for improving ASR in a low-resource scenario for non-native child speech.
引用
收藏
页码:41601 / 41610
页数:10
相关论文
共 50 条
  • [31] Native and non-native segmentation of continuous speech
    Hanulikova, Adriana
    Mitterer, Holger
    McQueen, M. James
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2008, 43 (3-4) : 675 - 675
  • [32] Multilingual Weighted Codebooks for Non-native Speech Recognition
    Raab, Martin
    Gruhn, Rainer
    Noeth, Elmar
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 485 - +
  • [33] Dual supervised learning for non-native speech recognition
    Kacper Radzikowski
    Robert Nowak
    Le Wang
    Osamu Yoshie
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [34] Optimizing non-native speech recognition for CALL applications
    van Doremalen, Joost
    Strik, Helmer
    Cucchiarini, Catia
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 588 - 591
  • [35] Comprehension in native and non-native English speakers
    Schubert, Teresa
    NATURE REVIEWS PSYCHOLOGY, 2024, 3 (12): : 785 - 785
  • [36] NATIVE LANGUAGE IMPACT ON THE REALIZATION OF SPEECH ACT OF COMPLIMENTING BY NON-NATIVE ENGLISH SPEAKERS
    Markovic, Suzana S.
    NASLEDE, 2022, 19 (51): : 97 - 114
  • [37] Stereotypes of Cantonese English, apparent native/non-native status, and their effect on non-native English speakers' perception
    Hu, Guiling
    Lindemann, Stephanie
    JOURNAL OF MULTILINGUAL AND MULTICULTURAL DEVELOPMENT, 2009, 30 (03) : 253 - 269
  • [38] Predicting Word Accuracy for the Automatic Speech Recognition of Non-Native Speech
    Yoon, Su-Youn
    Chen, Lei
    Zechner, Klaus
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 773 - 776
  • [39] Perception of non-native speech
    Baese-Berk, Melissa M.
    McLaughlin, Drew J.
    McGowan, Kevin B.
    LANGUAGE AND LINGUISTICS COMPASS, 2020, 14 (07):
  • [40] Perceiving non-native speech
    Bürki-Cohen, J
    Miller, JL
    Eimas, PD
    LANGUAGE AND SPEECH, 2001, 44 : 149 - 169