A Multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations

被引:48
|
作者
Zhang, Yazhou [1 ]
Wang, Jinglin [2 ]
Liu, Yaochen [2 ]
Rong, Lu [1 ]
Zheng, Qian [1 ]
Song, Dawei [2 ]
Tiwari, Prayag [3 ]
Qin, Jing [4 ]
机构
[1] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou 450002, Peoples R China
[2] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[3] Halmstad Univ, Sch Informat Technol, Halmstad, Sweden
[4] Hong Kong Polytech Univ, Ctr Smart Hlth, Sch Nursing, Hongkong, Peoples R China
基金
美国国家科学基金会;
关键词
Multimodal sarcasm recognition; Sentiment analysis; Emotion recognition; Multitask learning; Affective computing; INTERACTION DYNAMICS;
D O I
10.1016/j.inffus.2023.01.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sarcasm, sentiment and emotion are tightly coupled with each other in that one helps the understanding of another, which makes the joint recognition of sarcasm, sentiment and emotion in conversation a focus in the research in artificial intelligence (AI) and affective computing. Three main challenges exist: Context dependency, multimodal fusion and multitask interaction. However, most of the existing works fail to explicitly leverage and model the relationships among related tasks. In this paper, we aim to generically address the three problems with a multimodal joint framework. We thus propose a multimodal multitask learning model based on the encoder-decoder architecture, termed M2Seq2Seq. At the heart of the encoder module are two attention mechanisms, i.e., intramodal (Ia) attention and intermodal (Ie) attention. Ia attention is designed to capture the contextual dependency between adjacent utterances, while Ie attention is designed to model multimodal interactions. In contrast, we design two kinds of multitask learning (MTL) decoders, i.e., single -level and multilevel decoders, to explore their potential. More specifically, the core of a single-level decoder is a masked outer-modal (Or) self-attention mechanism. The main motivation of Or attention is to explicitly model the interdependence among the tasks of sarcasm, sentiment and emotion recognition. The core of the multilevel decoder contains the shared gating and task-specific gating networks. Comprehensive experiments on four bench datasets, MUStARD, Memotion, CMU-MOSEI and MELD, prove the effectiveness of M2Seq2Seq over state-of-the-art baselines (e.g., CM-GCN, A-MTL) with significant improvements of 1.9%, 2.0%, 5.0%, 0.8%, 4.3%, 3.1%, 2.8%, 1.0%, 1.7% and 2.8% in terms of Micro F1.
引用
收藏
页码:282 / 301
页数:20
相关论文
共 50 条
  • [1] Sentiment and Sarcasm Classification With Multitask Learning
    Majumder, Navonil
    Poria, Soujanya
    Peng, Haiyun
    Chhaya, Niyati
    Cambria, Erik
    Gelbukh, Alexander
    IEEE INTELLIGENT SYSTEMS, 2019, 34 (03) : 38 - 43
  • [2] Learning Multitask Commonness and Uniqueness for Multimodal Sarcasm Detection and Sentiment Analysis in Conversation
    Zhang Y.
    Yu Y.
    Zhao D.
    Li Z.
    Wang B.
    Hou Y.
    Tiwari P.
    Qin J.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (03): : 1349 - 1361
  • [3] Hybrid Quantum-Classical Neural Network for Multimodal Multitask Sarcasm, Emotion, and Sentiment Analysis
    Phukan, Arpan
    Pal, Santanu
    Ekbal, Asif
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (05): : 5740 - 5750
  • [4] A Multimodal Corpus for Emotion Recognition in Sarcasm
    Ray, Anupama
    Mishra, Shubham
    Nunna, Apoorva
    Bhattacharyya, Pushpak
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6992 - 7003
  • [5] Mathematical representation of emotion using multimodal recognition model with deep multitask learning
    Harata S.
    Sakuma T.
    Kato S.
    Harata, Seiichi (harata@katolab.nitech.ac.jp), 1600, Institute of Electrical Engineers of Japan (140): : 1343 - 1351
  • [6] Deep Imbalanced Learning for Multimodal Emotion Recognition in Conversations
    Meng, Tao
    Shou, Yuntao
    Ai, Wei
    Yin, Nan
    Li, Keqin
    IEEE Transactions on Artificial Intelligence, 2024, 5 (12): : 6472 - 6487
  • [7] Quantum neural networks for multimodal sentiment, emotion, and sarcasm analysis
    Singh, Jaiteg
    Bhangu, Kamalpreet Singh
    Alkhanifer, Abdulrhman
    Alzubi, Ahmad Ali
    Ali, Farman
    ALEXANDRIA ENGINEERING JOURNAL, 2025, 124 : 170 - 187
  • [8] Multimodal Sentiment Analysis: A Multitask Learning Approach
    Fortin, Mathieu Page
    Chaib-draa, Brahim
    ICPRAM: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2019, : 368 - 376
  • [9] A Multitask Learning Framework for Multimodal Sentiment Analysis
    Jiang, Dazhi
    Wei, Runguo
    Liu, Hao
    Wen, Jintao
    Tu, Geng
    Zheng, Lin
    Cambria, Erik
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 151 - 157
  • [10] MMTrans-MT: A Framework for Multimodal Emotion Recognition Using Multitask Learning
    Shen, Jinrui
    Zheng, Jiahao
    Wang, Xiaoping
    2021 13TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2021, : 52 - 59