A Multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations

被引:48
|
作者
Zhang, Yazhou [1 ]
Wang, Jinglin [2 ]
Liu, Yaochen [2 ]
Rong, Lu [1 ]
Zheng, Qian [1 ]
Song, Dawei [2 ]
Tiwari, Prayag [3 ]
Qin, Jing [4 ]
机构
[1] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou 450002, Peoples R China
[2] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[3] Halmstad Univ, Sch Informat Technol, Halmstad, Sweden
[4] Hong Kong Polytech Univ, Ctr Smart Hlth, Sch Nursing, Hongkong, Peoples R China
基金
美国国家科学基金会;
关键词
Multimodal sarcasm recognition; Sentiment analysis; Emotion recognition; Multitask learning; Affective computing; INTERACTION DYNAMICS;
D O I
10.1016/j.inffus.2023.01.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sarcasm, sentiment and emotion are tightly coupled with each other in that one helps the understanding of another, which makes the joint recognition of sarcasm, sentiment and emotion in conversation a focus in the research in artificial intelligence (AI) and affective computing. Three main challenges exist: Context dependency, multimodal fusion and multitask interaction. However, most of the existing works fail to explicitly leverage and model the relationships among related tasks. In this paper, we aim to generically address the three problems with a multimodal joint framework. We thus propose a multimodal multitask learning model based on the encoder-decoder architecture, termed M2Seq2Seq. At the heart of the encoder module are two attention mechanisms, i.e., intramodal (Ia) attention and intermodal (Ie) attention. Ia attention is designed to capture the contextual dependency between adjacent utterances, while Ie attention is designed to model multimodal interactions. In contrast, we design two kinds of multitask learning (MTL) decoders, i.e., single -level and multilevel decoders, to explore their potential. More specifically, the core of a single-level decoder is a masked outer-modal (Or) self-attention mechanism. The main motivation of Or attention is to explicitly model the interdependence among the tasks of sarcasm, sentiment and emotion recognition. The core of the multilevel decoder contains the shared gating and task-specific gating networks. Comprehensive experiments on four bench datasets, MUStARD, Memotion, CMU-MOSEI and MELD, prove the effectiveness of M2Seq2Seq over state-of-the-art baselines (e.g., CM-GCN, A-MTL) with significant improvements of 1.9%, 2.0%, 5.0%, 0.8%, 4.3%, 3.1%, 2.8%, 1.0%, 1.7% and 2.8% in terms of Micro F1.
引用
收藏
页码:282 / 301
页数:20
相关论文
共 50 条
  • [41] Emotion Recognition on Multimodal with Deep Learning and Ensemble
    Dharma, David Adi
    Zahra, Amalia
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 656 - 663
  • [42] Interactive Robot Learning for Multimodal Emotion Recognition
    Yu, Chuang
    Tapus, Adriana
    SOCIAL ROBOTICS, ICSR 2019, 2019, 11876 : 633 - 642
  • [43] Disentangled Representation Learning for Multimodal Emotion Recognition
    Yang, Dingkang
    Huang, Shuai
    Kuang, Haopeng
    Du, Yangtao
    Zhang, Lihua
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1642 - 1651
  • [44] A Multitask Framework for Sentiment, Emotion and Sarcasm aware Cyberbullying Detection from Multi-modal Code-Mixed Memes
    Maity, Krishanu
    Jha, Prince
    Saha, Sriparna
    Bhattacharyya, Pushpak
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1739 - 1749
  • [45] Multimodal Sentiment Recognition With Multi-Task Learning
    Zhang, Sun
    Yin, Chunyong
    Yin, Zhichao
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (01): : 200 - 209
  • [46] Adversarial Multi-task Model for Emotion, Sentiment, and Sarcasm Aided Complaint Detection
    Singh, Apoorva
    Nazir, Arousha
    Saha, Sriparna
    ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 428 - 442
  • [47] Sentiment- Emotion- and Context-Guided Knowledge Selection Framework for Emotion Recognition in Conversations
    Tu, Geng
    Liang, Bin
    Jiang, Dazhi
    Xu, Ruifeng
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1803 - 1816
  • [48] A Corpus for Sentiment Analysis and Emotion Recognition for a Learning Environment
    Oramas-Bustillos, Raul
    Lucia Barron-Estrada, Maria
    Zatarain-Cabada, Ramon
    Lucia Ramirez-Avila, Sandra
    2018 IEEE 18TH INTERNATIONAL CONFERENCE ON ADVANCED LEARNING TECHNOLOGIES (ICALT 2018), 2018, : 431 - 435
  • [49] MM-DFN: MULTIMODAL DYNAMIC FUSION NETWORK FOR EMOTION RECOGNITION IN CONVERSATIONS
    Hu, Dou
    Hou, Xiaolong
    Wei, Lingwei
    Jiang, Lianxin
    Mo, Yang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7037 - 7041
  • [50] A Multitask Learning Model for Online Pattern Recognition
    Ozawa, Seiichi
    Roy, Asim
    Roussinov, Dmitri
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (03): : 430 - 445