Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation

被引:3
|
作者
Zou, Shihao [1 ]
Huang, Xianying [1 ]
Shen, Xudong [1 ]
机构
[1] Chongqing Univ Technol, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
emotion recognition in conversation; multimodal prompt information; transformer; hybrid contrastive learning;
D O I
10.1145/3581783.3611805
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion Recognition in Conversation (ERC) plays an important role in driving the development of human-machine interaction. Emotions can exist in multiple modalities, and multimodal ERC mainly faces two problems: (1) the noise problem in the cross-modal information fusion process, and (2) the prediction problem of less sample emotion labels that are semantically similar but different categories. To address these issues and fully utilize the features of each modality, we adopted the following strategies: first, deep emotion cues extraction was performed on modalities with strong representation ability, and feature filters were designed as multimodal prompt information for modalities with weak representation ability. Then, we designed a Multimodal Prompt Transformer (MPT) to perform cross-modal information fusion. MPT embeds multimodal fusion information into each attention layer of the Transformer, allowing prompt information to participate in encoding textual features and being fused with multi-level textual information to obtain better multimodal fusion features. Finally, we used the Hybrid Contrastive Learning (HCL) strategy to optimize the model's ability to handle labels with few samples. This strategy uses unsupervised contrastive learning to improve the representation ability of multimodal fusion and supervised contrastive learning to mine the information of labels with few samples. Experimental results show that our proposed model outperforms state-of-the-art models in ERC on two benchmark datasets.
引用
收藏
页码:5994 / 6003
页数:10
相关论文
共 50 条
  • [31] MEMOBERT: PRE-TRAINING MODEL WITH PROMPT-BASED LEARNING FOR MULTIMODAL EMOTION RECOGNITION
    Zhao, Jinming
    Li, Ruichen
    Jin, Qin
    Wang, Xinchao
    Li, Haizhou
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4703 - 4707
  • [32] CONTRASTIVE UNSUPERVISED LEARNING FOR SPEECH EMOTION RECOGNITION
    Li, Mao
    Yang, Bo
    Levy, Joshua
    Stolcke, Andreas
    Rozgic, Viktor
    Matsoukas, Spyros
    Papayiannis, Constantinos
    Bone, Daniel
    Wang, Chao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6329 - 6333
  • [33] Noise-Resistant Multimodal Transformer for Emotion Recognition
    Yuanyuan Liu
    Haoyu Zhang
    Yibing Zhan
    Zijing Chen
    Guanghao Yin
    Lin Wei
    Zhe Chen
    International Journal of Computer Vision, 2025, 133 (5) : 3020 - 3040
  • [34] Multimodal transformer augmented fusion for speech emotion recognition
    Wang, Yuanyuan
    Gu, Yu
    Yin, Yifei
    Han, Yingping
    Zhang, He
    Wang, Shuang
    Li, Chenyu
    Quan, Dou
    FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [35] Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples
    Fan, Qi
    Li, Yutong
    Xin, Yi
    Cheng, Xinyu
    Gao, Guanglai
    Ma, Miao
    PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMODAL AND RESPONSIBLE AFFECTIVE COMPUTING, MRAC 2024, 2024, : 72 - 77
  • [36] Tri-CLT: Learning Tri-Modal Representations with Contrastive Learning and Transformer for Multimodal Sentiment Recognition
    Yang, Zhiyong
    Li, Zijian
    Zhu, Dongdong
    Zhou, Yu
    INFORMATION TECHNOLOGY AND CONTROL, 2024, 53 (01): : 206 - 219
  • [37] Multimodal graph learning with framelet-based stochastic configuration networks for emotion recognition in conversation
    Shi, Jiandong
    Li, Ming
    Chen, Yuting
    Cui, Lixin
    Bai, Lu
    INFORMATION SCIENCES, 2025, 686
  • [38] Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss
    Franceschini, Riccardo
    Fini, Enrico
    Beyan, Cigdem
    Conti, Alessandro
    Arrigoni, Federica
    Ricci, Elisa
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2589 - 2596
  • [39] KoHMT: A Multimodal Emotion Recognition Model Integrating KoELECTRA, HuBERT with Multimodal Transformer
    Yi, Moung-Ho
    Kwak, Keun-Chang
    Shin, Ju-Hyun
    ELECTRONICS, 2024, 13 (23):
  • [40] Improving Unimodal Object Recognition with Multimodal Contrastive Learning
    Meyer, Johannes
    Eitel, Andreas
    Brox, Thomas
    Burgard, Wolfram
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5656 - 5663