MMATERIC: Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation

被引：5

作者：

Liang, Xingwei ^{[1
,2
]}

Zou, You ^{[1
]}

Zhuang, Xinnan ^{[1
]}

Yang, Jie ^{[3
]}

Niu, Taiyu ^{[2
]}

Xu, Ruifeng ^{[2
]}

机构：

[1] Konka Corp, Shenzhen 518053, Peoples R China

[2] Harbin Inst Technol, Joint Lab HIT Konka, Shenzhen 518055, Peoples R China

[3] No Arizona Univ, Sch Informat Comp & Cyber Syst, Flagstaff, AZ 86011 USA

来源：

ELECTRONICS | 2023年 / 12卷 / 07期

基金：

中国国家自然科学基金;

关键词：

emotion recognition in conversation; Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation; multi-task learning; multimodal fusion;

D O I：

10.3390/electronics12071534

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The accurate recognition of emotions in conversations helps understand the speaker's intentions and facilitates various analyses in artificial intelligence, especially in human-computer interaction systems. However, most previous methods need more ability to track the different emotional states of each speaker in a dialogue. To alleviate this dilemma, we propose a new approach, Multi-Task Learning and Multi-Fusion AudioText Emotion Recognition in Conversation (MMATERIC) for emotion recognition in conversation. MMATERIC can refer to and combine the benefits of two distinct tasks: emotion recognition in text and emotion recognition in speech, and production of fused multimodal features to recognize the emotions of different speakers in dialogue. At the core of MATTERIC are three modules: an encoder with multimodal attention, a speaker emotion detection unit (SED-Unit), and a decoder with speaker emotion detection Bi-LSTM (SED-Bi-LSTM). Together, these three modules model the changing emotions of a speaker at a given moment in a conversation. Meanwhile, we adopt multiple fusion strategies in different stages, mainly using model fusion and decision stage fusion to improve the model's accuracy. Simultaneously, our multimodal framework allows features to interact across modalities and allows potential adaptation flows from one modality to another. Our experimental results on two benchmark datasets show that our proposed method is effective and outperforms the state-of-the-art baseline methods. The performance improvement of our method is mainly attributed to the combination of three core modules of MATTERIC and the different fusion methods we adopt in each stage.

引用

页数：15

共 50 条

[1] Speech Emotion Recognition with Multi-task Learning
Cai, Xingyu
Yuan, Jiahong
Zheng, Renjie
Huang, Liang
Church, Kenneth
INTERSPEECH 2021, 2021, : 4508 - 4512
[2] Multi-task Learning for Speech Emotion and Emotion Intensity Recognition
Yue, Pengcheng
Qu, Leyuan
Zheng, Shukai
Li, Taihao
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1232 - 1237
[3] Meta Multi-task Learning for Speech Emotion Recognition
Cai, Ruichu
Guo, Kaibin
Xu, Boyan
Yang, Xiaoyan
Zhang, Zhenjie
INTERSPEECH 2020, 2020, : 3336 - 3340
[4] Emotion Recognition With Sequential Multi-task Learning Technique
Phan Tran Dac Thinh
Hoang Manh Hung
Yang, Hyung-Jeong
Kim, Soo-Hyung
Lee, Guee-Sang
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3586 - 3589
[5] Speech Emotion Recognition based on Multi-Task Learning
Zhao, Huijuan
Han Zhijie
Wang, Ruchuan
2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
[6] Multi-modal embeddings using multi-task learning for emotion recognition
Khare, Aparna
Parthasarathy, Srinivas
Sundaram, Shiva
INTERSPEECH 2020, 2020, : 384 - 388
[7] Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
Akhtar, Md Shad
Chauhan, Dushyant Singh
Ghosal, Deepanway
Poria, Soujanya
Ekbal, Asif
Bhattacharyya, Pushpak
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 370 - 379
[8] MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
Ghosh, Sreyan
Tyagi, Utkarsh
Ramaneswaran, S.
Srivastava, Harshvardhan
Manocha, Dinesh
INTERSPEECH 2023, 2023, : 1209 - 1213
[9] Emotion recognition in conversations with emotion shift detection based on multi-task learning
Gao, Qingqing
Cao, Biwei
Guan, Xin
Gu, Tianyun
Bao, Xing
Wu, Junyan
Liu, Bo
Cao, Jiuxin
KNOWLEDGE-BASED SYSTEMS, 2022, 248
[10] Multi-dataset fusion for multi-task learning on face attribute recognition
Lu, Hengjie
Xu, Shugong
Wang, Jiahao
PATTERN RECOGNITION LETTERS, 2023, 173 : 72 - 78

← 1 2 3 4 5 →