Enhancing Emotion Recognition in Conversation Through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning

被引:0
|
作者
Shi, Haoxiang [1 ,2 ]
Zhang, Xulong [1 ]
Cheng, Ning [1 ]
Zhang, Yong [1 ]
Yu, Jun [2 ]
Xiao, Jing [1 ]
Wang, Jianzong [1 ]
机构
[1] Ping An Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024 | 2024年 / 14877卷
关键词
Emotion recognition; Multi-modal fusion; Contrastive learning;
D O I
10.1007/978-981-97-5669-8_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual information. Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differences between modalities, resulting in the model being unable to focus on modality-specific emotional information. At the same time, the shared information between modalities was not processed to generate emotions. Information redundancy problem. To overcome these limitations, we propose a cross-modal fusion emotion prediction network based on vector connections. The network mainly includes two stages: the multi-modal feature fusion stage based on connection vectors and the emotion classification stage based on fused features. Furthermore, we design a supervised inter-class contrastive learning module based on emotion labels. Experimental results confirm the effectiveness of the proposed method, demonstrating excellent performance on the IEMOCAP and MELD datasets.
引用
收藏
页码:391 / 401
页数:11
相关论文
共 50 条
  • [21] Multi-corpus emotion recognition method based on cross-modal gated attention fusion
    Ryumina, Elena
    Ryumin, Dmitry
    Axyonov, Alexandr
    Ivanko, Denis
    Karpov, Alexey
    PATTERN RECOGNITION LETTERS, 2025, 190 : 192 - 200
  • [22] Micro-Expression Recognition Based on Attribute Information Embedding and Cross-modal Contrastive Learning
    Song, Yanxin
    Wang, Jianzong
    Wu, Tianbo
    Huang, Zhangcheng
    Xiao, Jing
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [23] MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusion
    Khan, Mustaqeem
    Tran, Phuong-Nam
    Pham, Nhat Truong
    El Saddik, Abdulmotaleb
    Othmani, Alice
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [24] Speech Emotion Recognition Using Global-Aware Cross-Modal Feature Fusion Network
    Li, Feng
    Luo, Jiusong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT II, 2023, 14087 : 211 - 221
  • [25] UCEMA: Uni-modal and cross-modal encoding network based on multi-head attention for emotion recognition in conversation
    Zhao, Hongkun
    Liu, Siyuan
    Chen, Yang
    Kong, Fanmin
    Zeng, Qingtian
    Li, Kang
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [26] Impact of cross-modal priming using emotional music on facial emotion recognition among autistic children
    Xu, Fengrui
    Ding, Xiaoyue
    Zhang, Gong-Liang
    Liu, Dianzhi
    Liu, Jingyi
    Shu, Deming
    PSYCHOLOGY OF MUSIC, 2025,
  • [27] Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition
    Sunder Ali Khowaja
    Seok-Lyong Lee
    Neural Computing and Applications, 2020, 32 : 10423 - 10434
  • [28] EMP: Emotion-guided Multi-modal Fusion and Contrastive Learning for Personality Traits Recognition
    Wang, Yusong
    Li, Dongyuan
    Funakoshi, Kotaro
    Okumura, Manabu
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 243 - 252
  • [29] Label graph learning for multi-label image recognition with cross-modal fusion
    Xie, Yanzhao
    Wang, Yangtao
    Liu, Yu
    Zhou, Ke
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (18) : 25363 - 25381
  • [30] Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition
    Khowaja, Sunder Ali
    Lee, Seok-Lyong
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (14): : 10423 - 10434