MMATERIC: Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation

被引:5
|
作者
Liang, Xingwei [1 ,2 ]
Zou, You [1 ]
Zhuang, Xinnan [1 ]
Yang, Jie [3 ]
Niu, Taiyu [2 ]
Xu, Ruifeng [2 ]
机构
[1] Konka Corp, Shenzhen 518053, Peoples R China
[2] Harbin Inst Technol, Joint Lab HIT Konka, Shenzhen 518055, Peoples R China
[3] No Arizona Univ, Sch Informat Comp & Cyber Syst, Flagstaff, AZ 86011 USA
基金
中国国家自然科学基金;
关键词
emotion recognition in conversation; Multi-Task Learning and Multi-Fusion for AudioText Emotion Recognition in Conversation; multi-task learning; multimodal fusion;
D O I
10.3390/electronics12071534
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The accurate recognition of emotions in conversations helps understand the speaker's intentions and facilitates various analyses in artificial intelligence, especially in human-computer interaction systems. However, most previous methods need more ability to track the different emotional states of each speaker in a dialogue. To alleviate this dilemma, we propose a new approach, Multi-Task Learning and Multi-Fusion AudioText Emotion Recognition in Conversation (MMATERIC) for emotion recognition in conversation. MMATERIC can refer to and combine the benefits of two distinct tasks: emotion recognition in text and emotion recognition in speech, and production of fused multimodal features to recognize the emotions of different speakers in dialogue. At the core of MATTERIC are three modules: an encoder with multimodal attention, a speaker emotion detection unit (SED-Unit), and a decoder with speaker emotion detection Bi-LSTM (SED-Bi-LSTM). Together, these three modules model the changing emotions of a speaker at a given moment in a conversation. Meanwhile, we adopt multiple fusion strategies in different stages, mainly using model fusion and decision stage fusion to improve the model's accuracy. Simultaneously, our multimodal framework allows features to interact across modalities and allows potential adaptation flows from one modality to another. Our experimental results on two benchmark datasets show that our proposed method is effective and outperforms the state-of-the-art baseline methods. The performance improvement of our method is mainly attributed to the combination of three core modules of MATTERIC and the different fusion methods we adopt in each stage.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Speech Emotion Recognition with Multi-task Learning
    Cai, Xingyu
    Yuan, Jiahong
    Zheng, Renjie
    Huang, Liang
    Church, Kenneth
    INTERSPEECH 2021, 2021, : 4508 - 4512
  • [2] Multi-task Learning for Speech Emotion and Emotion Intensity Recognition
    Yue, Pengcheng
    Qu, Leyuan
    Zheng, Shukai
    Li, Taihao
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1232 - 1237
  • [3] Meta Multi-task Learning for Speech Emotion Recognition
    Cai, Ruichu
    Guo, Kaibin
    Xu, Boyan
    Yang, Xiaoyan
    Zhang, Zhenjie
    INTERSPEECH 2020, 2020, : 3336 - 3340
  • [4] Emotion Recognition With Sequential Multi-task Learning Technique
    Phan Tran Dac Thinh
    Hoang Manh Hung
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    Lee, Guee-Sang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3586 - 3589
  • [5] Speech Emotion Recognition based on Multi-Task Learning
    Zhao, Huijuan
    Han Zhijie
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
  • [6] Multi-modal embeddings using multi-task learning for emotion recognition
    Khare, Aparna
    Parthasarathy, Srinivas
    Sundaram, Shiva
    INTERSPEECH 2020, 2020, : 384 - 388
  • [7] Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
    Akhtar, Md Shad
    Chauhan, Dushyant Singh
    Ghosal, Deepanway
    Poria, Soujanya
    Ekbal, Asif
    Bhattacharyya, Pushpak
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 370 - 379
  • [8] MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
    Ghosh, Sreyan
    Tyagi, Utkarsh
    Ramaneswaran, S.
    Srivastava, Harshvardhan
    Manocha, Dinesh
    INTERSPEECH 2023, 2023, : 1209 - 1213
  • [9] Emotion recognition in conversations with emotion shift detection based on multi-task learning
    Gao, Qingqing
    Cao, Biwei
    Guan, Xin
    Gu, Tianyun
    Bao, Xing
    Wu, Junyan
    Liu, Bo
    Cao, Jiuxin
    KNOWLEDGE-BASED SYSTEMS, 2022, 248
  • [10] Multi-dataset fusion for multi-task learning on face attribute recognition
    Lu, Hengjie
    Xu, Shugong
    Wang, Jiahao
    PATTERN RECOGNITION LETTERS, 2023, 173 : 72 - 78