Enhancing Emotion Recognition in Conversation Through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning

被引:0
|
作者
Shi, Haoxiang [1 ,2 ]
Zhang, Xulong [1 ]
Cheng, Ning [1 ]
Zhang, Yong [1 ]
Yu, Jun [2 ]
Xiao, Jing [1 ]
Wang, Jianzong [1 ]
机构
[1] Ping An Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024 | 2024年 / 14877卷
关键词
Emotion recognition; Multi-modal fusion; Contrastive learning;
D O I
10.1007/978-981-97-5669-8_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual information. Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differences between modalities, resulting in the model being unable to focus on modality-specific emotional information. At the same time, the shared information between modalities was not processed to generate emotions. Information redundancy problem. To overcome these limitations, we propose a cross-modal fusion emotion prediction network based on vector connections. The network mainly includes two stages: the multi-modal feature fusion stage based on connection vectors and the emotion classification stage based on fused features. Furthermore, we design a supervised inter-class contrastive learning module based on emotion labels. Experimental results confirm the effectiveness of the proposed method, demonstrating excellent performance on the IEMOCAP and MELD datasets.
引用
收藏
页码:391 / 401
页数:11
相关论文
共 50 条
  • [31] Label graph learning for multi-label image recognition with cross-modal fusion
    Yanzhao Xie
    Yangtao Wang
    Yu Liu
    Ke Zhou
    Multimedia Tools and Applications, 2022, 81 : 25363 - 25381
  • [32] Enhancing emotional experiences to dance through music: the role of valence and arousal in the cross-modal bias
    Christensen, Julia F.
    Gaigg, Sebastian B.
    Gomila, Antoni
    Oke, Peter
    Calvo-Merino, Beatriz
    FRONTIERS IN HUMAN NEUROSCIENCE, 2014, 8
  • [33] Radar-Camera-based Cross-Modal Bi-Contrastive Learning for Human Motion Recognition
    Chen, Yuh-Shyan
    Cheng, Kuang-Hung
    2024 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC 2024, 2024,
  • [34] Text-Guided Face Recognition using Multi-Granularity Cross-Modal Contrastive Learning
    Hasan, Md Mahedi
    Sami, Shoaib Meraj
    Nasrabadi, Nasser
    2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024, 2024, : 5772 - 5781
  • [35] Hierarchical Cross-Modal Interaction and Fusion Network Enhanced with Self-Distillation for Emotion Recognition in Conversations
    Wei, Puling
    Yang, Juan
    Xiao, Yali
    ELECTRONICS, 2024, 13 (13)
  • [36] Interactive Co-Learning with Cross-Modal Transformer for Audio-Visual Emotion Recognition
    Takashima, Akihiko
    Masumura, Ryo
    Ando, Atsushi
    Yamazaki, Yoshihiro
    Uchida, Mihiro
    Orihashi, Shota
    INTERSPEECH 2022, 2022, : 4740 - 4744
  • [37] Incongruity-Aware Cross-Modal Attention for Audio-Visual Fusion in Dimensional Emotion Recognition
    Praveen, R. Gnana
    Alam, Jahangir
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (03) : 444 - 458
  • [38] Combining cross-modal knowledge transfer and semi-supervised learning for speech emotion recognition
    Zhang, Sheng
    Chen, Min
    Chen, Jincai
    Li, Yuan-Fang
    Wu, Yiling
    Li, Minglei
    Zhu, Chuanbo
    KNOWLEDGE-BASED SYSTEMS, 2021, 229
  • [39] Enhancing speech emotion recognition through deep learning and handcrafted feature fusion
    Eris, Fatma Gunes
    Akbal, Erhan
    APPLIED ACOUSTICS, 2024, 222
  • [40] BiCLR: Radar-Camera-Based Cross-Modal Bi-Contrastive Learning for Human Motion Recognition
    Chen, Yuh-Shyan
    Cheng, Kuang-Hung
    IEEE SENSORS JOURNAL, 2024, 24 (03) : 4102 - 4119