Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation

被引:3
|
作者
Zou, Shihao [1 ]
Huang, Xianying [1 ]
Shen, Xudong [1 ]
机构
[1] Chongqing Univ Technol, Chongqing, Peoples R China
基金
中国国家自然科学基金;
关键词
emotion recognition in conversation; multimodal prompt information; transformer; hybrid contrastive learning;
D O I
10.1145/3581783.3611805
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion Recognition in Conversation (ERC) plays an important role in driving the development of human-machine interaction. Emotions can exist in multiple modalities, and multimodal ERC mainly faces two problems: (1) the noise problem in the cross-modal information fusion process, and (2) the prediction problem of less sample emotion labels that are semantically similar but different categories. To address these issues and fully utilize the features of each modality, we adopted the following strategies: first, deep emotion cues extraction was performed on modalities with strong representation ability, and feature filters were designed as multimodal prompt information for modalities with weak representation ability. Then, we designed a Multimodal Prompt Transformer (MPT) to perform cross-modal information fusion. MPT embeds multimodal fusion information into each attention layer of the Transformer, allowing prompt information to participate in encoding textual features and being fused with multi-level textual information to obtain better multimodal fusion features. Finally, we used the Hybrid Contrastive Learning (HCL) strategy to optimize the model's ability to handle labels with few samples. This strategy uses unsupervised contrastive learning to improve the representation ability of multimodal fusion and supervised contrastive learning to mine the information of labels with few samples. Experimental results show that our proposed model outperforms state-of-the-art models in ERC on two benchmark datasets.
引用
收藏
页码:5994 / 6003
页数:10
相关论文
共 50 条
  • [21] C-BGA: Multimodal Speech Emotion Recognition Network Combining Contrastive Learning
    Miao, Borui
    Xu, Yunfeng
    Zhao, Shaojie
    Wang, Jialin
    Computer Engineering and Applications, 60 (16): : 168 - 176
  • [22] Self-supervised representation learning using multimodal Transformer for emotion recognition
    Goetz, Theresa
    Arora, Pulkit
    Erick, F. X.
    Holzer, Nina
    Sawant, Shrutika
    PROCEEDINGS OF THE 8TH INTERNATIONAL WORKSHOP ON SENSOR-BASED ACTIVITY RECOGNITION AND ARTIFICIAL INTELLIGENCE, IWOAR 2023, 2023,
  • [23] Contextual Information and Commonsense Based Prompt for Emotion Recognition in Conversation
    Yi, Jingjie
    Yang, Deqing
    Yuan, Siyu
    Cao, Kaiyan
    Zhang, Zhiyao
    Xiao, Yanghua
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II, 2023, 13714 : 707 - 723
  • [24] Modeling Hierarchical Uncertainty for Multimodal Emotion Recognition in Conversation
    Chen, Feiyu
    Shao, Jie
    Zhu, Anjie
    Ouyang, Deqiang
    Liu, Xueliang
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (01) : 187 - 198
  • [25] A Contextual Attention Network for Multimodal Emotion Recognition in Conversation
    Wang, Tana
    Hou, Yaqing
    Zhou, Dongsheng
    Zhang, Qiang
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [26] Interactive Multimodal Attention Network for Emotion Recognition in Conversation
    Ren, Minjie
    Huang, Xiangdong
    Shi, Xiaoqi
    Nie, Weizhi
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1046 - 1050
  • [27] Bilevel Relational Graph Representation Learning-based Multimodal Emotion Recognition in Conversation
    Zhao, Huan
    Ju, Yi
    Gao, Yingxue
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
  • [28] HyFusER: Hybrid Multimodal Transformer for Emotion Recognition Using Dual Cross Modal Attention
    Yi, Moung-Ho
    Kwak, Keun-Chang
    Shin, Ju-Hyun
    APPLIED SCIENCES-BASEL, 2025, 15 (03):
  • [29] Learning What and When to Drop: Adaptive Multimodal and Contextual Dynamics for Emotion Recognition in Conversation
    Chen, Feiyu
    Sun, Zhengxiao
    Ouyang, Deqiang
    Liu, Xueliang
    Shao, Jie
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1064 - 1073
  • [30] Bi-stream graph learning based multimodal fusion for emotion recognition in conversation
    Lu, Nannan
    Han, Zhiyuan
    Han, Min
    Qian, Jiansheng
    INFORMATION FUSION, 2024, 106