Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation

被引：3

作者：

Zou, Shihao ^{[1
]}

Huang, Xianying ^{[1
]}

Shen, Xudong ^{[1
]}

机构：

[1] Chongqing Univ Technol, Chongqing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

emotion recognition in conversation; multimodal prompt information; transformer; hybrid contrastive learning;

D O I：

10.1145/3581783.3611805

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Emotion Recognition in Conversation (ERC) plays an important role in driving the development of human-machine interaction. Emotions can exist in multiple modalities, and multimodal ERC mainly faces two problems: (1) the noise problem in the cross-modal information fusion process, and (2) the prediction problem of less sample emotion labels that are semantically similar but different categories. To address these issues and fully utilize the features of each modality, we adopted the following strategies: first, deep emotion cues extraction was performed on modalities with strong representation ability, and feature filters were designed as multimodal prompt information for modalities with weak representation ability. Then, we designed a Multimodal Prompt Transformer (MPT) to perform cross-modal information fusion. MPT embeds multimodal fusion information into each attention layer of the Transformer, allowing prompt information to participate in encoding textual features and being fused with multi-level textual information to obtain better multimodal fusion features. Finally, we used the Hybrid Contrastive Learning (HCL) strategy to optimize the model's ability to handle labels with few samples. This strategy uses unsupervised contrastive learning to improve the representation ability of multimodal fusion and supervised contrastive learning to mine the information of labels with few samples. Experimental results show that our proposed model outperforms state-of-the-art models in ERC on two benchmark datasets.

引用

页码：5994 / 6003

页数：10

共 50 条

[31] MEMOBERT: PRE-TRAINING MODEL WITH PROMPT-BASED LEARNING FOR MULTIMODAL EMOTION RECOGNITION
Zhao, Jinming
Li, Ruichen
Jin, Qin
Wang, Xinchao
Li, Haizhou
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4703 - 4707
[32] CONTRASTIVE UNSUPERVISED LEARNING FOR SPEECH EMOTION RECOGNITION
Li, Mao
Yang, Bo
Levy, Joshua
Stolcke, Andreas
Rozgic, Viktor
Matsoukas, Spyros
Papayiannis, Constantinos
Bone, Daniel
Wang, Chao
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6329 - 6333
[33] Noise-Resistant Multimodal Transformer for Emotion Recognition
Yuanyuan Liu
Haoyu Zhang
Yibing Zhan
Zijing Chen
Guanghao Yin
Lin Wei
Zhe Chen
International Journal of Computer Vision, 2025, 133 (5) : 3020 - 3040
[34] Multimodal transformer augmented fusion for speech emotion recognition
Wang, Yuanyuan
Gu, Yu
Yin, Yifei
Han, Yingping
Zhang, He
Wang, Shuang
Li, Chenyu
Quan, Dou
FRONTIERS IN NEUROROBOTICS, 2023, 17
[35] Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples
Fan, Qi
Li, Yutong
Xin, Yi
Cheng, Xinyu
Gao, Guanglai
Ma, Miao
PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON MULTIMODAL AND RESPONSIBLE AFFECTIVE COMPUTING, MRAC 2024, 2024, : 72 - 77
[36] Tri-CLT: Learning Tri-Modal Representations with Contrastive Learning and Transformer for Multimodal Sentiment Recognition
Yang, Zhiyong
Li, Zijian
Zhu, Dongdong
Zhou, Yu
INFORMATION TECHNOLOGY AND CONTROL, 2024, 53 (01): : 206 - 219
[37] Multimodal graph learning with framelet-based stochastic configuration networks for emotion recognition in conversation
Shi, Jiandong
Li, Ming
Chen, Yuting
Cui, Lixin
Bai, Lu
INFORMATION SCIENCES, 2025, 686
[38] Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss
Franceschini, Riccardo
Fini, Enrico
Beyan, Cigdem
Conti, Alessandro
Arrigoni, Federica
Ricci, Elisa
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2589 - 2596
[39] KoHMT: A Multimodal Emotion Recognition Model Integrating KoELECTRA, HuBERT with Multimodal Transformer
Yi, Moung-Ho
Kwak, Keun-Chang
Shin, Ju-Hyun
ELECTRONICS, 2024, 13 (23):
[40] Improving Unimodal Object Recognition with Multimodal Contrastive Learning
Meyer, Johannes
Eitel, Andreas
Brox, Thomas
Burgard, Wolfram
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5656 - 5663

← 1 2 3 4 5 →