A multimodal fusion-based deep learning framework combined with local-global contextual TCNs for continuous emotion recognition from videos

被引：1

作者：

Shi, Congbao ^{[1
]}

Zhang, Yuanyuan ^{[1
]}

Liu, Baolin ^{[1
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China

来源：

APPLIED INTELLIGENCE | 2024年 / 54卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Continuous emotion recognition; Local/global contextual temporal convolutional network; Multi-modal attention fusion; Temporal multi-scale information; CONVOLUTION NETWORK; AUDIO;

D O I：

10.1007/s10489-024-05329-w

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Continuous emotion recognition plays a crucial role in developing friendly and natural human-computer interaction applications. However, there exist two significant challenges unresolved in this field: how to effectively fuse complementary information from multiple modalities and capture long-range contextual dependencies during emotional evolution. In this paper, a novel multimodal continuous emotion recognition framework was proposed to address the above challenges. For the multimodal fusion challenge, the Multimodal Attention Fusion (MAF) method is proposed to fully utilize complementarity and redundancy between multiple modalities. To tackle temporal context dependencies, the Local Contextual Temporal Convolutional Network (LC-TCN) and the Global Contextual Temporal Convolutional Network (GC-TCN) were presented. These networks have the ability to progressively integrate multi-scale temporal contextual information from input streams of different modalities. Comprehensive experiments are conducted on the RECOLA and SEWA datasets to assess the effectiveness of our proposed framework. The experimental results demonstrate superior recognition performance compared to state-of-the-art approaches, achieving 0.834 and 0.671 on RECOLA, 0.573 and 0.533 on SEWA in terms of arousal and valence, respectively. These findings indicate a novel direction for continuous emotion recognition by exploring temporal multi-scale information.

引用

页码：3040 / 3057

页数：18

共 17 条

[1] A multimodal fusion-based deep learning framework combined with local-global contextual TCNs for continuous emotion recognition from videos
Congbao Shi
Yuanyuan Zhang
Baolin Liu
Applied Intelligence, 2024, 54 : 3040 - 3057
[2] A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos
Qi, Shubao
Liu, Baolin
PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 1493 - 1503
[3] A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos
Shubao Qi
Baolin Liu
Pattern Analysis and Applications, 2023, 26 (3) : 1493 - 1503
[4] Multimodal Local-Global Ranking Fusion for Emotion Recognition
Liang, Paul Pu
Zadeh, Amir
Morency, Louis-Philippe
ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 472 - 476
[5] HiMul-LGG: A hierarchical decision fusion-based local-global graph neural network for multimodal emotion recognition in conversation
Fu, Changzeng
Qian, Fengkui
Su, Kaifeng
Su, Yikai
Wang, Ze
Shi, Jiaqi
Liu, Zhigang
Liu, Chaoran
Ishi, Carlos Toshinori
NEURAL NETWORKS, 2025, 181
[6] SPEECH EMOTION RECOGNITION WITH LOCAL-GLOBAL AWARE DEEP REPRESENTATION LEARNING
Liu, Jiaxing
Liu, Zhilei
Wang, Longbiao
Guo, Lili
Dang, Jianwu
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7174 - 7178
[7] A Deep Learning Framework with Optimizations for Facial Expression and Emotion Recognition from Videos
Nukathati, Ranjit Kumar
Nagella, Uday Bhaskar
Kumar, A. P. Siva
INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2025, 16 (03) : 217 - 229
[8] Emotion Recognition and Classification of Film Reviews Based on Deep Learning and Multimodal Fusion
Na, Risu
Sun, Ning
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
[9] Deep Learning-Based Emotion Recognition from Real-Time Videos
Zhou, Wenbin
Cheng, Justin
Lei, Xingyu
Benes, Bedrich
Adamo, Nicoletta
HUMAN-COMPUTER INTERACTION. MULTIMODAL AND NATURAL INTERACTION, HCI 2020, PT II, 2020, 12182 : 321 - 332
[10] A Deep-Learning-Based Multimodal Data Fusion Framework for Urban Region Function Recognition
Yu, Mingyang
Xu, Haiqing
Zhou, Fangliang
Xu, Shuai
Yin, Hongling
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2023, 12 (12)

← 1 2 →