Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation

被引:3
|
作者
Zou, Shihao [1 ]
Huang, Xianying [1 ]
Shen, Xudong [1 ]
机构
[1] Chongqing Univ Technol, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
emotion recognition in conversation; multimodal prompt information; transformer; hybrid contrastive learning;
D O I
10.1145/3581783.3611805
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion Recognition in Conversation (ERC) plays an important role in driving the development of human-machine interaction. Emotions can exist in multiple modalities, and multimodal ERC mainly faces two problems: (1) the noise problem in the cross-modal information fusion process, and (2) the prediction problem of less sample emotion labels that are semantically similar but different categories. To address these issues and fully utilize the features of each modality, we adopted the following strategies: first, deep emotion cues extraction was performed on modalities with strong representation ability, and feature filters were designed as multimodal prompt information for modalities with weak representation ability. Then, we designed a Multimodal Prompt Transformer (MPT) to perform cross-modal information fusion. MPT embeds multimodal fusion information into each attention layer of the Transformer, allowing prompt information to participate in encoding textual features and being fused with multi-level textual information to obtain better multimodal fusion features. Finally, we used the Hybrid Contrastive Learning (HCL) strategy to optimize the model's ability to handle labels with few samples. This strategy uses unsupervised contrastive learning to improve the representation ability of multimodal fusion and supervised contrastive learning to mine the information of labels with few samples. Experimental results show that our proposed model outperforms state-of-the-art models in ERC on two benchmark datasets.
引用
收藏
页码:5994 / 6003
页数:10
相关论文
共 50 条
  • [1] Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation
    Shen, Xudong
    Huang, Xianying
    Zou, Shihao
    Gan, Xinyi
    NEUROCOMPUTING, 2024, 582
  • [2] Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation
    Shen, Xudong
    Huang, Xianying
    Zou, Shihao
    Gan, Xinyi
    Neurocomputing, 2024, 582
  • [3] Emotion Recognition in Conversation Using ERC Roberta with Prompt Learning
    Gong Q.
    Yu K.
    Wu X.
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2023, 46 (05): : 106 - 111and138
  • [4] Unlocking the Power of Multimodal Learning for Emotion Recognition in Conversation
    Wang, Yunxiao
    Liu, Meng
    Li, Zhe
    Hu, Yupeng
    Luo, Xin
    Nie, Liqiang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5947 - 5955
  • [5] Improving multimodal fusion with Main Modal Transformer for emotion recognition in conversation
    Zou, ShiHao
    Huang, Xianying
    Shen, XuDong
    Liu, Hankai
    KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [6] Hybrid Curriculum Learning for Emotion Recognition in Conversation
    Yang, Lin
    Shen, Yi
    Mao, Yue
    Cai, Longjun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11595 - 11603
  • [7] Robust Representation Learning for Multimodal Emotion Recognition with Contrastive Learning and Mixup
    Car, Yunrui
    Ye, Runchuan
    Xie, Jingran
    Zhou, Yixuan
    Xu, Yaoxun
    Wu, Zhiyong
    PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMODAL AND RESPONSIBLE AFFECTIVE COMPUTING, MRAC 2024, 2024, : 93 - 97
  • [8] MM-NodeFormer: Node Transformer Multimodal Fusion for Emotion Recognition in Conversation
    Huang, Zilong
    Mak, Man-Wai
    Lee, Kong Aik
    INTERSPEECH 2024, 2024, : 4069 - 4073
  • [9] Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
    Guo, Zirun
    Jin, Tao
    Zhao, Zhou
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1726 - 1736
  • [10] Multimodal Prompt Learning in Emotion Recognition Using Context and Audio Information
    Jeong, Eunseo
    Kim, Gyunyeop
    Kang, Sangwoo
    MATHEMATICS, 2023, 11 (13)