MMATERIC: Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation

被引：5

作者：

Liang, Xingwei ^{[1
,2
]}

Zou, You ^{[1
]}

Zhuang, Xinnan ^{[1
]}

Yang, Jie ^{[3
]}

Niu, Taiyu ^{[2
]}

Xu, Ruifeng ^{[2
]}

机构：

[1] Konka Corp, Shenzhen 518053, Peoples R China

[2] Harbin Inst Technol, Joint Lab HIT Konka, Shenzhen 518055, Peoples R China

[3] No Arizona Univ, Sch Informat Comp & Cyber Syst, Flagstaff, AZ 86011 USA

来源：

ELECTRONICS | 2023年 / 12卷 / 07期

基金：

中国国家自然科学基金;

关键词：

emotion recognition in conversation; Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation; multi-task learning; multimodal fusion;

D O I：

10.3390/electronics12071534

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The accurate recognition of emotions in conversations helps understand the speaker's intentions and facilitates various analyses in artificial intelligence, especially in human-computer interaction systems. However, most previous methods need more ability to track the different emotional states of each speaker in a dialogue. To alleviate this dilemma, we propose a new approach, Multi-Task Learning and Multi-Fusion AudioText Emotion Recognition in Conversation (MMATERIC) for emotion recognition in conversation. MMATERIC can refer to and combine the benefits of two distinct tasks: emotion recognition in text and emotion recognition in speech, and production of fused multimodal features to recognize the emotions of different speakers in dialogue. At the core of MATTERIC are three modules: an encoder with multimodal attention, a speaker emotion detection unit (SED-Unit), and a decoder with speaker emotion detection Bi-LSTM (SED-Bi-LSTM). Together, these three modules model the changing emotions of a speaker at a given moment in a conversation. Meanwhile, we adopt multiple fusion strategies in different stages, mainly using model fusion and decision stage fusion to improve the model's accuracy. Simultaneously, our multimodal framework allows features to interact across modalities and allows potential adaptation flows from one modality to another. Our experimental results on two benchmark datasets show that our proposed method is effective and outperforms the state-of-the-art baseline methods. The performance improvement of our method is mainly attributed to the combination of three core modules of MATTERIC and the different fusion methods we adopt in each stage.

引用

页数：15

共 50 条

[21] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
Zhao, Huijuan
Ye, Ning
Wang, Ruchuan
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (2-3): : 299 - 308
[22] Multi-task learning on the edge for effective gender, age, ethnicity and emotion recognition
Foggia, Pasquale
Greco, Antonio
Saggese, Alessia
Vento, Mario
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 118
[23] AdMISC: Advanced Multi-Task Learning and Feature-Fusion for Emotional Support Conversation
Jia, Xuhui
He, Jia
Zhang, Qian
Jin, Jin
ELECTRONICS, 2024, 13 (08)
[24] A Multi-Scale Multi-Task Learning Model for Continuous Dimensional Emotion Recognition from Audio
Li, Xia
Lu, Guanming
Yan, Jingjie
Zhang, Zhengyan
ELECTRONICS, 2022, 11 (03)
[25] Multi-Task Ensemble Learning for Affect Recognition
Gjoreski, Martin
Lustrek, Mitja
Gams, Matjaz
PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 553 - 558
[26] Multimodal Sentiment Recognition With Multi-Task Learning
Zhang, Sun
Yin, Chunyong
Yin, Zhichao
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (01): : 200 - 209
[27] Multi-Task Conformer with Multi-Feature Combination for Speech Emotion Recognition
Seo, Jiyoung
Lee, Bowon
SYMMETRY-BASEL, 2022, 14 (07):
[28] Multi-task gradient descent for multi-task learning
Bai, Lu
Ong, Yew-Soon
He, Tiantian
Gupta, Abhishek
MEMETIC COMPUTING, 2020, 12 (04) : 355 - 369
[29] Multi-task gradient descent for multi-task learning
Lu Bai
Yew-Soon Ong
Tiantian He
Abhishek Gupta
Memetic Computing, 2020, 12 : 355 - 369
[30] Speech Emotion Recognition Based on Multi-Task Learning Using a Convolutional Neural Network
Kim, Nam Kyun
Lee, Jiwon
Ha, Hun Kyu
Lee, Geon Woo
Lee, Jung Hyuk
Kim, Hong Kook
2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 704 - 707

← 1 2 3 4 5 →