Enhancing Emotion Recognition in Conversation Through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning

被引：0

作者：

Shi, Haoxiang ^{[1
,2
]}

Zhang, Xulong ^{[1
]}

Cheng, Ning ^{[1
]}

Zhang, Yong ^{[1
]}

Yu, Jun ^{[2
]}

Xiao, Jing ^{[1
]}

Wang, Jianzong ^{[1
]}

机构：

[1] Ping An Technol Shenzhen Co Ltd, Shenzhen, Peoples R China

[2] Univ Sci & Technol China, Hefei, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024 | 2024年 / 14877卷

关键词：

Emotion recognition; Multi-modal fusion; Contrastive learning;

D O I：

10.1007/978-981-97-5669-8_32

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual information. Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differences between modalities, resulting in the model being unable to focus on modality-specific emotional information. At the same time, the shared information between modalities was not processed to generate emotions. Information redundancy problem. To overcome these limitations, we propose a cross-modal fusion emotion prediction network based on vector connections. The network mainly includes two stages: the multi-modal feature fusion stage based on connection vectors and the emotion classification stage based on fused features. Furthermore, we design a supervised inter-class contrastive learning module based on emotion labels. Experimental results confirm the effectiveness of the proposed method, demonstrating excellent performance on the IEMOCAP and MELD datasets.

引用

页码：391 / 401

页数：11

共 50 条

[21] Multi-corpus emotion recognition method based on cross-modal gated attention fusion
Ryumina, Elena
Ryumin, Dmitry
Axyonov, Alexandr
Ivanko, Denis
Karpov, Alexey
PATTERN RECOGNITION LETTERS, 2025, 190 : 192 - 200
[22] Micro-Expression Recognition Based on Attribute Information Embedding and Cross-modal Contrastive Learning
Song, Yanxin
Wang, Jianzong
Wu, Tianbo
Huang, Zhangcheng
Xiao, Jing
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[23] MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusion
Khan, Mustaqeem
Tran, Phuong-Nam
Pham, Nhat Truong
El Saddik, Abdulmotaleb
Othmani, Alice
SCIENTIFIC REPORTS, 2025, 15 (01):
[24] Speech Emotion Recognition Using Global-Aware Cross-Modal Feature Fusion Network
Li, Feng
Luo, Jiusong
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT II, 2023, 14087 : 211 - 221
[25] UCEMA: Uni-modal and cross-modal encoding network based on multi-head attention for emotion recognition in conversation
Zhao, Hongkun
Liu, Siyuan
Chen, Yang
Kong, Fanmin
Zeng, Qingtian
Li, Kang
MULTIMEDIA SYSTEMS, 2024, 30 (06)
[26] Impact of cross-modal priming using emotional music on facial emotion recognition among autistic children
Xu, Fengrui
Ding, Xiaoyue
Zhang, Gong-Liang
Liu, Dianzhi
Liu, Jingyi
Shu, Deming
PSYCHOLOGY OF MUSIC, 2025,
[27] Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition
Sunder Ali Khowaja
Seok-Lyong Lee
Neural Computing and Applications, 2020, 32 : 10423 - 10434
[28] EMP: Emotion-guided Multi-modal Fusion and Contrastive Learning for Personality Traits Recognition
Wang, Yusong
Li, Dongyuan
Funakoshi, Kotaro
Okumura, Manabu
PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 243 - 252
[29] Label graph learning for multi-label image recognition with cross-modal fusion
Xie, Yanzhao
Wang, Yangtao
Liu, Yu
Zhou, Ke
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (18) : 25363 - 25381
[30] Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition
Khowaja, Sunder Ali
Lee, Seok-Lyong
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (14): : 10423 - 10434

← 1 2 3 4 5 →