Weakly-supervised word-level pronunciation error detection in non-native English speech

被引:1
|
作者
Korzekwa, Daniel [1 ,2 ]
Lorenzo-Trueba, Jaime [3 ]
Drugman, Thomas [3 ]
Calamaro, Shira [3 ]
Kostek, Bozena [2 ]
机构
[1] Amazon, Warsaw, Poland
[2] Gdansk Univ Technol, Fac ETI, Gdansk, Poland
[3] Amazon, London, England
来源
关键词
automated pronunciation assessment; speech processing; second-language learning; deep learning;
D O I
10.21437/Interspeech.2021-38
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced L2 speech, the model is more likely to overfit. To limit this risk, we train it in a multi-task setup. In the first task, we estimate the probabilities of word-level mispronunciation. For the second task, we use a phoneme recognizer trained on phonetically transcribed L1 speech that is easily accessible and can be automatically annotated. Compared to state-of-the-art approaches, we improve the accuracy of detecting word-level pronunciation errors in AUC metric by 30% on the GUT Isle Corpus of L2 Polish speakers, and by 21.5% on the Isle Corpus of L2 German and Italian speakers.
引用
收藏
页码:4408 / 4412
页数:5
相关论文
共 50 条
  • [41] Non-native English Teachers' Views towards Pedagogic Goals and Models of Pronunciation
    Takagishi, Ryosuke
    ASIAN ENGLISHES, 2012, 15 (02) : 108 - 135
  • [42] LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech
    Kim, Haechan
    Myung, Junho
    Kim, Seoyoung
    Lee, Sungpah
    Kang, Dongyeop
    Kim, Juho
    INTERSPEECH 2024, 2024, : 2325 - 2329
  • [43] TEACHING IDIOMS AND FIGURES OF SPEECH TO NON-NATIVE SPEAKERS OF ENGLISH
    ADKINS, PG
    MODERN LANGUAGE JOURNAL, 1968, 52 (03): : 148 - 152
  • [44] Synthesizing Near Native-accented Speech for a Non-native Speaker by Imitating the Pronunciation and Prosody of a Native Speaker
    Chung, Raymond
    Mak, Brian
    INTERSPEECH 2022, 2022, : 4302 - 4306
  • [45] Articulatory Modeling for Pronunciation Error Detection without Non-Native Training Data Based on DNN Transfer Learning
    Duan, Richeng
    Kawahara, Tatsuya
    Dantsuji, Masatake
    Zhang, Jinsong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (09): : 2174 - 2182
  • [46] IMITATION OF ENGLISH VOWEL DURATION UPON EXPOSURE TO NATIVE AND NON-NATIVE SPEECH
    Zajac, Magdalena
    Rojczyk, Arkadiusz
    POZNAN STUDIES IN CONTEMPORARY LINGUISTICS, 2014, 50 (04): : 495 - 514
  • [47] Synthesized speech intelligibility among native speakers and non-native speakers of English
    Alamsaputra, Diane Mayasari
    Kohnert, Kathryn J.
    Munson, Benjamin
    Reichle, Joe
    AUGMENTATIVE AND ALTERNATIVE COMMUNICATION, 2006, 22 (04) : 258 - 268
  • [48] Supervised and unsupervised learning of multidimensionally varying non-native speech categories
    Goudbeek, Martijn
    Cutler, Anne
    Smits, Roel
    SPEECH COMMUNICATION, 2008, 50 (02) : 109 - 125
  • [49] Listening to accents: Comprehensibility, accentedness and intelligibility of native and non-native English speech
    Verbeke, Gil
    Simon, Ellen
    LINGUA, 2023, 292
  • [50] Native and non-native talkers' mutual speech intelligibility of English focus sentences
    Lee, Joo-Kyeong
    LINGUISTIC RESEARCH, 2014, 31 (03) : 441 - 463