MMATERIC: Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation

被引:5
|
作者
Liang, Xingwei [1 ,2 ]
Zou, You [1 ]
Zhuang, Xinnan [1 ]
Yang, Jie [3 ]
Niu, Taiyu [2 ]
Xu, Ruifeng [2 ]
机构
[1] Konka Corp, Shenzhen 518053, Peoples R China
[2] Harbin Inst Technol, Joint Lab HIT Konka, Shenzhen 518055, Peoples R China
[3] No Arizona Univ, Sch Informat Comp & Cyber Syst, Flagstaff, AZ 86011 USA
基金
中国国家自然科学基金;
关键词
emotion recognition in conversation; Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation; multi-task learning; multimodal fusion;
D O I
10.3390/electronics12071534
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The accurate recognition of emotions in conversations helps understand the speaker's intentions and facilitates various analyses in artificial intelligence, especially in human-computer interaction systems. However, most previous methods need more ability to track the different emotional states of each speaker in a dialogue. To alleviate this dilemma, we propose a new approach, Multi-Task Learning and Multi-Fusion AudioText Emotion Recognition in Conversation (MMATERIC) for emotion recognition in conversation. MMATERIC can refer to and combine the benefits of two distinct tasks: emotion recognition in text and emotion recognition in speech, and production of fused multimodal features to recognize the emotions of different speakers in dialogue. At the core of MATTERIC are three modules: an encoder with multimodal attention, a speaker emotion detection unit (SED-Unit), and a decoder with speaker emotion detection Bi-LSTM (SED-Bi-LSTM). Together, these three modules model the changing emotions of a speaker at a given moment in a conversation. Meanwhile, we adopt multiple fusion strategies in different stages, mainly using model fusion and decision stage fusion to improve the model's accuracy. Simultaneously, our multimodal framework allows features to interact across modalities and allows potential adaptation flows from one modality to another. Our experimental results on two benchmark datasets show that our proposed method is effective and outperforms the state-of-the-art baseline methods. The performance improvement of our method is mainly attributed to the combination of three core modules of MATTERIC and the different fusion methods we adopt in each stage.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Coarse-to-Fine Speech Emotion Recognition Based on Multi-Task Learning
    Zhao, Huijuan
    Ye, Ning
    Wang, Ruchuan
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2021, 93 (2-3): : 299 - 308
  • [22] Multi-task learning on the edge for effective gender, age, ethnicity and emotion recognition
    Foggia, Pasquale
    Greco, Antonio
    Saggese, Alessia
    Vento, Mario
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 118
  • [23] AdMISC: Advanced Multi-Task Learning and Feature-Fusion for Emotional Support Conversation
    Jia, Xuhui
    He, Jia
    Zhang, Qian
    Jin, Jin
    ELECTRONICS, 2024, 13 (08)
  • [24] A Multi-Scale Multi-Task Learning Model for Continuous Dimensional Emotion Recognition from Audio
    Li, Xia
    Lu, Guanming
    Yan, Jingjie
    Zhang, Zhengyan
    ELECTRONICS, 2022, 11 (03)
  • [25] Multi-Task Ensemble Learning for Affect Recognition
    Gjoreski, Martin
    Lustrek, Mitja
    Gams, Matjaz
    PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 553 - 558
  • [26] Multimodal Sentiment Recognition With Multi-Task Learning
    Zhang, Sun
    Yin, Chunyong
    Yin, Zhichao
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (01): : 200 - 209
  • [27] Multi-Task Conformer with Multi-Feature Combination for Speech Emotion Recognition
    Seo, Jiyoung
    Lee, Bowon
    SYMMETRY-BASEL, 2022, 14 (07):
  • [28] Multi-task gradient descent for multi-task learning
    Bai, Lu
    Ong, Yew-Soon
    He, Tiantian
    Gupta, Abhishek
    MEMETIC COMPUTING, 2020, 12 (04) : 355 - 369
  • [29] Multi-task gradient descent for multi-task learning
    Lu Bai
    Yew-Soon Ong
    Tiantian He
    Abhishek Gupta
    Memetic Computing, 2020, 12 : 355 - 369
  • [30] Speech Emotion Recognition Based on Multi-Task Learning Using a Convolutional Neural Network
    Kim, Nam Kyun
    Lee, Jiwon
    Ha, Hun Kyu
    Lee, Geon Woo
    Lee, Jung Hyuk
    Kim, Hong Kook
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 704 - 707