A Multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations

被引:48
|
作者
Zhang, Yazhou [1 ]
Wang, Jinglin [2 ]
Liu, Yaochen [2 ]
Rong, Lu [1 ]
Zheng, Qian [1 ]
Song, Dawei [2 ]
Tiwari, Prayag [3 ]
Qin, Jing [4 ]
机构
[1] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou 450002, Peoples R China
[2] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[3] Halmstad Univ, Sch Informat Technol, Halmstad, Sweden
[4] Hong Kong Polytech Univ, Ctr Smart Hlth, Sch Nursing, Hongkong, Peoples R China
基金
美国国家科学基金会;
关键词
Multimodal sarcasm recognition; Sentiment analysis; Emotion recognition; Multitask learning; Affective computing; INTERACTION DYNAMICS;
D O I
10.1016/j.inffus.2023.01.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sarcasm, sentiment and emotion are tightly coupled with each other in that one helps the understanding of another, which makes the joint recognition of sarcasm, sentiment and emotion in conversation a focus in the research in artificial intelligence (AI) and affective computing. Three main challenges exist: Context dependency, multimodal fusion and multitask interaction. However, most of the existing works fail to explicitly leverage and model the relationships among related tasks. In this paper, we aim to generically address the three problems with a multimodal joint framework. We thus propose a multimodal multitask learning model based on the encoder-decoder architecture, termed M2Seq2Seq. At the heart of the encoder module are two attention mechanisms, i.e., intramodal (Ia) attention and intermodal (Ie) attention. Ia attention is designed to capture the contextual dependency between adjacent utterances, while Ie attention is designed to model multimodal interactions. In contrast, we design two kinds of multitask learning (MTL) decoders, i.e., single -level and multilevel decoders, to explore their potential. More specifically, the core of a single-level decoder is a masked outer-modal (Or) self-attention mechanism. The main motivation of Or attention is to explicitly model the interdependence among the tasks of sarcasm, sentiment and emotion recognition. The core of the multilevel decoder contains the shared gating and task-specific gating networks. Comprehensive experiments on four bench datasets, MUStARD, Memotion, CMU-MOSEI and MELD, prove the effectiveness of M2Seq2Seq over state-of-the-art baselines (e.g., CM-GCN, A-MTL) with significant improvements of 1.9%, 2.0%, 5.0%, 0.8%, 4.3%, 3.1%, 2.8%, 1.0%, 1.7% and 2.8% in terms of Micro F1.
引用
收藏
页码:282 / 301
页数:20
相关论文
共 50 条
  • [31] DGSNet: Dual Graph Structure Network for Emotion Recognition in Multimodal Conversations
    Tang, Shimin
    Wang, Changjian
    Tian, Fengyu
    Xu, Kele
    Xu, Minpeng
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 78 - 85
  • [32] MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations
    Poria, Soujanya
    Hazarika, Devamanyu
    Majumder, Navonil
    Naik, Gautam
    Cambria, Erik
    Mihalcea, Rada
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 527 - 536
  • [33] Multitask, Multilabel, and Multidomain Learning With Convolutional Networks for Emotion Recognition
    Pons, Gerard
    Masip, David
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) : 4764 - 4771
  • [34] MULTITASK LEARNING AND MULTISTAGE FUSION FOR DIMENSIONAL AUDIOVISUAL EMOTION RECOGNITION
    Atmaja, Bagus Tris
    Akagi, Masato
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4482 - 4486
  • [35] An Emotion-Space Model of Multimodal Emotion Recognition
    Choe, Kyung-Il
    ADVANCED SCIENCE LETTERS, 2018, 24 (01) : 699 - 702
  • [36] Supervised Adversarial Contrastive Learning for Emotion Recognition in Conversations
    Hu, Dou
    Bao, Yinan
    Wei, Lingwei
    Zhou, Wei
    Hu, Songlin
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10835 - 10852
  • [37] Deep Emotional Arousal Network for Multimodal Sentiment Analysis and Emotion Recognition
    Zhang, Feng
    Li, Xi-Cheng
    Lim, Chee Peng
    Hua, Qiang
    Dong, Chun-Ru
    Zhai, Jun -Hai
    INFORMATION FUSION, 2022, 88 : 296 - 304
  • [38] Deep emotional arousal network for multimodal sentiment analysis and emotion recognition
    Zhang F.
    Li X.-C.
    Dong C.-R.
    Hua Q.
    Kongzhi yu Juece/Control and Decision, 2022, 37 (11): : 2984 - 2992
  • [39] Emotion Recognition Using Multimodal Deep Learning
    Liu, Wei
    Zheng, Wei-Long
    Lu, Bao-Liang
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT II, 2016, 9948 : 521 - 529
  • [40] Emotion Recognition on Multimodal with Deep Learning and Ensemble
    Dharma, David Adi
    Zahra, Amalia
    International Journal of Advanced Computer Science and Applications, 2022, 13 (12): : 656 - 663