MMATERIC: Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation

被引：5

作者：

Liang, Xingwei ^{[1
,2
]}

Zou, You ^{[1
]}

Zhuang, Xinnan ^{[1
]}

Yang, Jie ^{[3
]}

Niu, Taiyu ^{[2
]}

Xu, Ruifeng ^{[2
]}

机构：

[1] Konka Corp, Shenzhen 518053, Peoples R China

[2] Harbin Inst Technol, Joint Lab HIT Konka, Shenzhen 518055, Peoples R China

[3] No Arizona Univ, Sch Informat Comp & Cyber Syst, Flagstaff, AZ 86011 USA

来源：

ELECTRONICS | 2023年 / 12卷 / 07期

基金：

中国国家自然科学基金;

关键词：

emotion recognition in conversation; Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation; multi-task learning; multimodal fusion;

D O I：

10.3390/electronics12071534

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The accurate recognition of emotions in conversations helps understand the speaker's intentions and facilitates various analyses in artificial intelligence, especially in human-computer interaction systems. However, most previous methods need more ability to track the different emotional states of each speaker in a dialogue. To alleviate this dilemma, we propose a new approach, Multi-Task Learning and Multi-Fusion AudioText Emotion Recognition in Conversation (MMATERIC) for emotion recognition in conversation. MMATERIC can refer to and combine the benefits of two distinct tasks: emotion recognition in text and emotion recognition in speech, and production of fused multimodal features to recognize the emotions of different speakers in dialogue. At the core of MATTERIC are three modules: an encoder with multimodal attention, a speaker emotion detection unit (SED-Unit), and a decoder with speaker emotion detection Bi-LSTM (SED-Bi-LSTM). Together, these three modules model the changing emotions of a speaker at a given moment in a conversation. Meanwhile, we adopt multiple fusion strategies in different stages, mainly using model fusion and decision stage fusion to improve the model's accuracy. Simultaneously, our multimodal framework allows features to interact across modalities and allows potential adaptation flows from one modality to another. Our experimental results on two benchmark datasets show that our proposed method is effective and outperforms the state-of-the-art baseline methods. The performance improvement of our method is mainly attributed to the combination of three core modules of MATTERIC and the different fusion methods we adopt in each stage.

引用

页数：15

共 50 条

[31] LEVERAGING VALENCE AND ACTIVATION INFORMATION VIA MULTI-TASK LEARNING FOR CATEGORICAL EMOTION RECOGNITION
Xia, Rui
Liu, Yang
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5301 - 5305
[32] Multi-Task and Attention Collaborative Network for Facial Emotion Recognition
Wang, Xiaohua
Yu, Cong
Gu, Yu
Hu, Min
Ren, Fuji
IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2021, 16 (04) : 568 - 576
[33] SELECTIVE MULTI-TASK LEARNING FOR SPEECH EMOTION RECOGNITION USING CORPORA OF DIFFERENT STYLES
Zhang, Heran
Mimura, Masato
Kawahara, Tatsuya
Ishizuka, Kenkichi
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7707 - 7711
[34] MTLFuseNet: A novel emotion recognition model based on deep latent feature fusion of EEG signals and multi-task learning
Li, Rui
Ren, Chao
Ge, Yiqing
Zhao, Qiqi
Yang, Yikun
Shi, Yuhan
Zhang, Xiaowei
Hu, Bin
KNOWLEDGE-BASED SYSTEMS, 2023, 276
[35] MULTI-MODAL MULTI-TASK DEEP LEARNING FOR SPEAKER AND EMOTION RECOGNITION OF TV-SERIES DATA
Novitasari, Sashi
Quoc Truong Do
Sakti, Sakriani
Lestari, Dessi
Nakamura, Satoshi
2018 ORIENTAL COCOSDA - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2018, : 37 - 42
[36] Multi-label emotion classification based on adversarial multi-task learning
Lin, Nankai
Fu, Sihui
Lin, Xiaotian
Wang, Lianxi
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (06)
[37] A Novel DE-CNN-BiLSTM Multi-Fusion Model for EEG Emotion Recognition
Cui, Fachang
Wang, Ruqing
Ding, Weiwei
Chen, Yao
Huang, Liya
MATHEMATICS, 2022, 10 (04)
[38] MULTI-OBJECTIVE MULTI-TASK LEARNING ON RNNLM FOR SPEECH RECOGNITION
Song, Minguang
Zhao, Yunxin
Wang, Shaojun
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 197 - 203
[39] Multi-Domain and Multi-Task Learning for Human Action Recognition
Liu, An-An
Xu, Ning
Nie, Wei-Zhi
Su, Yu-Ting
Zhang, Yong-Dong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 853 - 867
[40] Multi-EmoNet: A Novel Multi-Task Neural Network for Driver Emotion Recognition
Cui, Yaodong
Ma, Yintao
Li, Wenbo
Bian, Ning
Li, Guofa
Cao, Dongpu
IFAC PAPERSONLINE, 2020, 53 (05): : 650 - 655

← 1 2 3 4 5 →