A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning

被引:1
|
作者
Ryu, Hyungshin [1 ]
Kim, Sunhee [2 ]
Chung, Minhwa [1 ]
机构
[1] Seoul Natl Univ, Dept Linguist, Seoul, South Korea
[2] Seoul Natl Univ, Dept French Language Educ, Seoul, South Korea
来源
关键词
computer-assisted pronunciation training; multi-task learning; mispronunciation detection and diagnosis; automatic pronunciation assessment; transfer learning; SPEECH; COMPREHENSIBILITY; INTELLIGIBILITY; ACCENTEDNESS; GRANULARITY;
D O I
10.21437/Interspeech.2023-337
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Empirical studies report a strong correlation between pronunciation proficiency scores and phonetic errors in non-native speech assessments of human evaluators. However, the existing system of computer-assisted pronunciation training (CAPT) regards automatic pronunciation assessment (APA) and mis-pronunciation detection and diagnosis (MDD) as independent and focuses on individual performance improvement. Motivated by the correlation between two tasks, we propose a novel architecture that jointly tackles APA and MDD using CTC and cross-entropy criteria with a multi-task learning scheme to benefit both tasks. To leverage additional knowledge transfer, Wav2Vec2-robust finetuned on TIMIT is used for the joint optimization. The integrated model significantly outperforms single-task learning, with a mean of 0.057 PCC increase for APA and 0.004 F1 increase for MDD on Speechocean762, which reveals that proficiency scores and phonetic errors are correlated for both human and model assessments.
引用
收藏
页码:959 / 963
页数:5
相关论文
共 50 条
  • [21] Multi-task learning for video anomaly detection*
    Chang, Xingya
    Zhang, Yuxin
    Xue, Dingyu
    Chen, Dongyue
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 87
  • [22] Multi-task learning for video anomaly detection
    Chang, Xingya
    Zhang, Yuxin
    Xue, Dingyu
    Chen, Dongyue
    Journal of Visual Communication and Image Representation, 2022, 87
  • [23] MULTI-TASK LEARNING FOR VOICE TRIGGER DETECTION
    Sigtia, Siddharth
    Clark, Pascal
    Haynes, Rob
    Richards, Hywel
    Bridle, John
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7449 - 7453
  • [24] Automatic Cataract Detection with Multi-Task Learning
    Wu, Hongjie
    Lv, Jiancheng
    Wang, Jian
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [25] Multi-task gradient descent for multi-task learning
    Lu Bai
    Yew-Soon Ong
    Tiantian He
    Abhishek Gupta
    Memetic Computing, 2020, 12 : 355 - 369
  • [26] Multi-task gradient descent for multi-task learning
    Bai, Lu
    Ong, Yew-Soon
    He, Tiantian
    Gupta, Abhishek
    MEMETIC COMPUTING, 2020, 12 (04) : 355 - 369
  • [27] Joint aspect terms extraction and aspect categories detection via multi-task learning
    Wei, Youcai
    Zhang, Hongyun
    Fang, Jian
    Wen, Jiahui
    Ma, Jingwei
    Zhang, Guangda
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 174
  • [28] Model-Protected Multi-Task Learning
    Liang, Jian
    Liu, Ziqi
    Zhou, Jiayu
    Jiang, Xiaoqian
    Zhang, Changshui
    Wang, Fei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 1002 - 1019
  • [29] Multi-Task Clustering with Model Relation Learning
    Zhang, Xiaotong
    Zhang, Xianchao
    Liu, Han
    Luo, Jiebo
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3132 - 3140
  • [30] Bearing Fault Diagnosis based on Multi-task Learning
    Mao, Wentao
    He, Jianliang
    Feng, Wushi
    Tian, Siyu
    2018 PROGNOSTICS AND SYSTEM HEALTH MANAGEMENT CONFERENCE (PHM-CHONGQING 2018), 2018, : 358 - 363