A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning

被引:1
|
作者
Ryu, Hyungshin [1 ]
Kim, Sunhee [2 ]
Chung, Minhwa [1 ]
机构
[1] Seoul Natl Univ, Dept Linguist, Seoul, South Korea
[2] Seoul Natl Univ, Dept French Language Educ, Seoul, South Korea
来源
关键词
computer-assisted pronunciation training; multi-task learning; mispronunciation detection and diagnosis; automatic pronunciation assessment; transfer learning; SPEECH; COMPREHENSIBILITY; INTELLIGIBILITY; ACCENTEDNESS; GRANULARITY;
D O I
10.21437/Interspeech.2023-337
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Empirical studies report a strong correlation between pronunciation proficiency scores and phonetic errors in non-native speech assessments of human evaluators. However, the existing system of computer-assisted pronunciation training (CAPT) regards automatic pronunciation assessment (APA) and mis-pronunciation detection and diagnosis (MDD) as independent and focuses on individual performance improvement. Motivated by the correlation between two tasks, we propose a novel architecture that jointly tackles APA and MDD using CTC and cross-entropy criteria with a multi-task learning scheme to benefit both tasks. To leverage additional knowledge transfer, Wav2Vec2-robust finetuned on TIMIT is used for the joint optimization. The integrated model significantly outperforms single-task learning, with a mean of 0.057 PCC increase for APA and 0.004 F1 increase for MDD on Speechocean762, which reveals that proficiency scores and phonetic errors are correlated for both human and model assessments.
引用
收藏
页码:959 / 963
页数:5
相关论文
共 50 条
  • [1] Phonological Feature Based Mispronunciation Detection and Diagnosis using Multi-Task DNNs and Active Learning
    Arora, Vipul
    Lahiri, Aditi
    Reetz, Henning
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1432 - 1436
  • [2] Multi-Task Learning for Mispronunciation Detection on Singapore Children's Mandarin Speech
    Tong, Rong
    Chen, Nancy E.
    Ma, Bin
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2193 - 2197
  • [3] Multi-Task Model and Feature Joint Learning
    Li, Ya
    Tian, Xinmei
    Liu, Tongliang
    Tao, Dacheng
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3643 - 3649
  • [4] Mispronunciation detection and diagnosis with acoustic pronunciation model aided modeling
    Liu, Zongming
    Wang, Li
    Li, Junfeng
    Zhang, Pengyuan
    Shengxue Xuebao/Acta Acustica, 2023, 48 (01): : 264 - 273
  • [5] Pronunciation Error Detection using DNN Articulatory Model based on Multi-lingual and Multi-task Learning
    Duan, Richeng
    Kawahara, Tatsuya
    Dantsuji, Masatake
    Zhang, Jinsong
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [6] Joint Disaster Classification and Victim Detection using Multi-Task Learning
    Tham, Mau-Luen
    Wong, Yi Jie
    Kwan, Ban Hoe
    Owada, Yasunori
    Sein, Myint Myint
    Chang, Yoong Choon
    2021 IEEE 12TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2021, : 407 - 412
  • [7] Multi-Task Learning Based Joint Pulse Detection and Modulation Classification
    Akyon, Fatih Cagatay
    Nuhoglu, Mustafa Atahan
    Alp, Yasar Kemal
    Arikan, Orhan
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [8] Multi-Task Joint-Learning for Robust Voice Activity Detection
    Zhuang, Yimeng
    Tong, Sibo
    Yin, Maofan
    Qian, Yanmin
    Yu, Kai
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [9] Multi-task multi-modal learning for joint diagnosis and prognosis of human cancers
    Shao, Wei
    Wang, Tongxin
    Sun, Liang
    Dong, Tianhan
    Han, Zhi
    Huang, Zhi
    Zhang, Jie
    Zhang, Daoqiang
    Huang, Kun
    MEDICAL IMAGE ANALYSIS, 2020, 65 (65)
  • [10] Multi-Task Based Mispronunciation Detection of Children Speech Using Multi-Lingual Information
    Wei, Linxuan
    Dong, Wenwei
    Lin, Binghuai
    Zhang, Jinsong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1791 - 1794