A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-task Learning

被引：1

作者：

Ryu, Hyungshin ^{[1
]}

Kim, Sunhee ^{[2
]}

Chung, Minhwa ^{[1
]}

机构：

[1] Seoul Natl Univ, Dept Linguist, Seoul, South Korea

[2] Seoul Natl Univ, Dept French Language Educ, Seoul, South Korea

来源：

INTERSPEECH 2023 | 2023年

关键词：

computer-assisted pronunciation training; multi-task learning; mispronunciation detection and diagnosis; automatic pronunciation assessment; transfer learning; SPEECH; COMPREHENSIBILITY; INTELLIGIBILITY; ACCENTEDNESS; GRANULARITY;

D O I：

10.21437/Interspeech.2023-337

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Empirical studies report a strong correlation between pronunciation proficiency scores and phonetic errors in non-native speech assessments of human evaluators. However, the existing system of computer-assisted pronunciation training (CAPT) regards automatic pronunciation assessment (APA) and mis-pronunciation detection and diagnosis (MDD) as independent and focuses on individual performance improvement. Motivated by the correlation between two tasks, we propose a novel architecture that jointly tackles APA and MDD using CTC and cross-entropy criteria with a multi-task learning scheme to benefit both tasks. To leverage additional knowledge transfer, Wav2Vec2-robust finetuned on TIMIT is used for the joint optimization. The integrated model significantly outperforms single-task learning, with a mean of 0.057 PCC increase for APA and 0.004 F1 increase for MDD on Speechocean762, which reveals that proficiency scores and phonetic errors are correlated for both human and model assessments.

引用

页码：959 / 963

页数：5

共 50 条

[21] Multi-task learning for video anomaly detection*
Chang, Xingya
Zhang, Yuxin
Xue, Dingyu
Chen, Dongyue
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 87
[22] Multi-task learning for video anomaly detection
Chang, Xingya
Zhang, Yuxin
Xue, Dingyu
Chen, Dongyue
Journal of Visual Communication and Image Representation, 2022, 87
[23] MULTI-TASK LEARNING FOR VOICE TRIGGER DETECTION
Sigtia, Siddharth
Clark, Pascal
Haynes, Rob
Richards, Hywel
Bridle, John
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7449 - 7453
[24] Automatic Cataract Detection with Multi-Task Learning
Wu, Hongjie
Lv, Jiancheng
Wang, Jian
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[25] Multi-task gradient descent for multi-task learning
Lu Bai
Yew-Soon Ong
Tiantian He
Abhishek Gupta
Memetic Computing, 2020, 12 : 355 - 369
[26] Multi-task gradient descent for multi-task learning
Bai, Lu
Ong, Yew-Soon
He, Tiantian
Gupta, Abhishek
MEMETIC COMPUTING, 2020, 12 (04) : 355 - 369
[27] Joint aspect terms extraction and aspect categories detection via multi-task learning
Wei, Youcai
Zhang, Hongyun
Fang, Jian
Wen, Jiahui
Ma, Jingwei
Zhang, Guangda
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 174
[28] Model-Protected Multi-Task Learning
Liang, Jian
Liu, Ziqi
Zhou, Jiayu
Jiang, Xiaoqian
Zhang, Changshui
Wang, Fei
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 1002 - 1019
[29] Multi-Task Clustering with Model Relation Learning
Zhang, Xiaotong
Zhang, Xianchao
Liu, Han
Luo, Jiebo
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3132 - 3140
[30] Bearing Fault Diagnosis based on Multi-task Learning
Mao, Wentao
He, Jianliang
Feng, Wushi
Tian, Siyu
2018 PROGNOSTICS AND SYSTEM HEALTH MANAGEMENT CONFERENCE (PHM-CHONGQING 2018), 2018, : 358 - 363

← 1 2 3 4 5 →