Weakly-supervised word-level pronunciation error detection in non-native English speech

被引:1
|
作者
Korzekwa, Daniel [1 ,2 ]
Lorenzo-Trueba, Jaime [3 ]
Drugman, Thomas [3 ]
Calamaro, Shira [3 ]
Kostek, Bozena [2 ]
机构
[1] Amazon, Warsaw, Poland
[2] Gdansk Univ Technol, Fac ETI, Gdansk, Poland
[3] Amazon, London, England
来源
关键词
automated pronunciation assessment; speech processing; second-language learning; deep learning;
D O I
10.21437/Interspeech.2021-38
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced L2 speech, the model is more likely to overfit. To limit this risk, we train it in a multi-task setup. In the first task, we estimate the probabilities of word-level mispronunciation. For the second task, we use a phoneme recognizer trained on phonetically transcribed L1 speech that is easily accessible and can be automatically annotated. Compared to state-of-the-art approaches, we improve the accuracy of detecting word-level pronunciation errors in AUC metric by 30% on the GUT Isle Corpus of L2 Polish speakers, and by 21.5% on the Isle Corpus of L2 German and Italian speakers.
引用
收藏
页码:4408 / 4412
页数:5
相关论文
共 50 条
  • [31] Weakly-Supervised Action Segmentation and Unseen Error Detection in Anomalous Instructional Videos
    Ghoddoosian, Reza
    Dwivedi, Isht
    Agarwal, Nakul
    Dariush, Behzad
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10094 - 10104
  • [32] Error patterns of native and non-native listeners' perception of speech in noise
    Zinszer, Benjamin D.
    Riggs, Meredith
    Reetzke, Rachel
    Chandrasekaran, Bharath
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (02): : EL129 - EL135
  • [33] Cross-Lingual Transfer Learning of Non-Native Acoustic Modeling for Pronunciation Error Detection and Diagnosis
    Duan, Richeng
    Kawahara, Tatsuya
    Dantsuji, Masatake
    Nanjo, Hiroaki
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 391 - 401
  • [34] Intelligibility of English Mosaic Speech: Comparison between Native and Non-Native Speakers of English
    Santi
    Nakajima, Yoshitaka
    Ueda, Kazuo
    Remijn, Gerard B.
    APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 13
  • [35] Exploring Native and Non-Native English Child Speech Recognition With Whisper
    Jain, Rishabh
    Barcovschi, Andrei
    Yiwere, Mariam Yahayah
    Corcoran, Peter
    Cucu, Horia
    IEEE ACCESS, 2024, 12 : 41601 - 41610
  • [36] Non-native pronunciation variants of city names as a problem for speech technology applications
    Schaden, S
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 229 - 236
  • [37] A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition
    Oh, Yoo Rhee
    Kim, Hong Kook
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2379 - 2387
  • [38] Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling
    Kouzelis, Theodoros
    Paraskevopoulos, Georgios
    Katsamanis, Athanasios
    Katsouros, Vassilis
    INTERSPEECH 2023, 2023, : 1563 - 1567
  • [39] NON-NATIVE SPEECH CORPORA FOR THE DEVELOPMENT OF COMPUTER ASSISTED PRONUNCIATION TRAINING SYSTEMS
    Carranza, M.
    Cucchiarini, C.
    Burgos, P.
    Strik, H.
    EDULEARN14: 6TH INTERNATIONAL CONFERENCE ON EDUCATION AND NEW LEARNING TECHNOLOGIES, 2014, : 3624 - 3633
  • [40] MLLR/MAP Adaptation Using Pronunciation Variation for Non-native Speech Recognition
    Oh, Yoo Rhee
    Kim, Hong Kook
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 216 - 221