Enhancing Emotion Recognition in Conversation Through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning

被引:0
|
作者
Shi, Haoxiang [1 ,2 ]
Zhang, Xulong [1 ]
Cheng, Ning [1 ]
Zhang, Yong [1 ]
Yu, Jun [2 ]
Xiao, Jing [1 ]
Wang, Jianzong [1 ]
机构
[1] Ping An Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024 | 2024年 / 14877卷
关键词
Emotion recognition; Multi-modal fusion; Contrastive learning;
D O I
10.1007/978-981-97-5669-8_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual information. Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differences between modalities, resulting in the model being unable to focus on modality-specific emotional information. At the same time, the shared information between modalities was not processed to generate emotions. Information redundancy problem. To overcome these limitations, we propose a cross-modal fusion emotion prediction network based on vector connections. The network mainly includes two stages: the multi-modal feature fusion stage based on connection vectors and the emotion classification stage based on fused features. Furthermore, we design a supervised inter-class contrastive learning module based on emotion labels. Experimental results confirm the effectiveness of the proposed method, demonstrating excellent performance on the IEMOCAP and MELD datasets.
引用
收藏
页码:391 / 401
页数:11
相关论文
共 50 条
  • [1] Cross-modal contrastive learning for multimodal sentiment recognition
    Yang, Shanliang
    Cui, Lichao
    Wang, Lei
    Wang, Tao
    APPLIED INTELLIGENCE, 2024, 54 (05) : 4260 - 4276
  • [2] Cross-modal contrastive learning for multimodal sentiment recognition
    Shanliang Yang
    Lichao Cui
    Lei Wang
    Tao Wang
    Applied Intelligence, 2024, 54 : 4260 - 4276
  • [3] A Cross-Modal Correlation Fusion Network for Emotion Recognition in Conversations
    Tang, Xiaolyu
    Cai, Guoyong
    Chen, Ming
    Yuan, Peicong
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT V, NLPCC 2024, 2025, 15363 : 55 - 68
  • [4] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
    Cao Xiaopeng
    Zhang Linying
    Chen Qiuxian
    Ning Hailong
    Dong Yizhuo
    The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
  • [5] Cross-Modal Dynamic Transfer Learning for Multimodal Emotion Recognition
    Hong, Soyeon
    Kang, Hyeoungguk
    Cho, Hyunsouk
    IEEE ACCESS, 2024, 12 : 14324 - 14333
  • [6] Enhancing Cross-Modal Understanding for Audio Visual Scene-Aware Dialog Through Contrastive Learning
    Xu, Feifei
    Zhou, Wang
    Li, Guangzhen
    Zhong, Zheng
    Zhou, Yingchen
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [7] Elder emotion classification through multimodal fusion of intermediate layers and cross-modal transfer learning
    Sreevidya, P.
    Veni, S.
    Murthy, O. V. Ramana
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (05) : 1281 - 1288
  • [8] Elder emotion classification through multimodal fusion of intermediate layers and cross-modal transfer learning
    P. Sreevidya
    S. Veni
    O. V. Ramana Murthy
    Signal, Image and Video Processing, 2022, 16 : 1281 - 1288
  • [9] Gated Multi-modal Fusion with Cross-modal Contrastive Learning for Video Question Answering
    Lyu, Chenyang
    Li, Wenxi
    Ji, Tianbo
    Zhou, Liting
    Gurrin, Cathal
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 427 - 438
  • [10] SELF-SUPERVISED LEARNING WITH CROSS-MODAL TRANSFORMERS FOR EMOTION RECOGNITION
    Khare, Aparna
    Parthasarathy, Srinivas
    Sundaram, Shiva
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 381 - 388