A multimodal fusion-based deep learning framework combined with local-global contextual TCNs for continuous emotion recognition from videos

被引:1
|
作者
Shi, Congbao [1 ]
Zhang, Yuanyuan [1 ]
Liu, Baolin [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
Continuous emotion recognition; Local/global contextual temporal convolutional network; Multi-modal attention fusion; Temporal multi-scale information; CONVOLUTION NETWORK; AUDIO;
D O I
10.1007/s10489-024-05329-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Continuous emotion recognition plays a crucial role in developing friendly and natural human-computer interaction applications. However, there exist two significant challenges unresolved in this field: how to effectively fuse complementary information from multiple modalities and capture long-range contextual dependencies during emotional evolution. In this paper, a novel multimodal continuous emotion recognition framework was proposed to address the above challenges. For the multimodal fusion challenge, the Multimodal Attention Fusion (MAF) method is proposed to fully utilize complementarity and redundancy between multiple modalities. To tackle temporal context dependencies, the Local Contextual Temporal Convolutional Network (LC-TCN) and the Global Contextual Temporal Convolutional Network (GC-TCN) were presented. These networks have the ability to progressively integrate multi-scale temporal contextual information from input streams of different modalities. Comprehensive experiments are conducted on the RECOLA and SEWA datasets to assess the effectiveness of our proposed framework. The experimental results demonstrate superior recognition performance compared to state-of-the-art approaches, achieving 0.834 and 0.671 on RECOLA, 0.573 and 0.533 on SEWA in terms of arousal and valence, respectively. These findings indicate a novel direction for continuous emotion recognition by exploring temporal multi-scale information.
引用
收藏
页码:3040 / 3057
页数:18
相关论文
共 17 条
  • [1] A multimodal fusion-based deep learning framework combined with local-global contextual TCNs for continuous emotion recognition from videos
    Congbao Shi
    Yuanyuan Zhang
    Baolin Liu
    Applied Intelligence, 2024, 54 : 3040 - 3057
  • [2] A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos
    Qi, Shubao
    Liu, Baolin
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 1493 - 1503
  • [3] A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos
    Shubao Qi
    Baolin Liu
    Pattern Analysis and Applications, 2023, 26 (3) : 1493 - 1503
  • [4] Multimodal Local-Global Ranking Fusion for Emotion Recognition
    Liang, Paul Pu
    Zadeh, Amir
    Morency, Louis-Philippe
    ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 472 - 476
  • [5] HiMul-LGG: A hierarchical decision fusion-based local-global graph neural network for multimodal emotion recognition in conversation
    Fu, Changzeng
    Qian, Fengkui
    Su, Kaifeng
    Su, Yikai
    Wang, Ze
    Shi, Jiaqi
    Liu, Zhigang
    Liu, Chaoran
    Ishi, Carlos Toshinori
    NEURAL NETWORKS, 2025, 181
  • [6] SPEECH EMOTION RECOGNITION WITH LOCAL-GLOBAL AWARE DEEP REPRESENTATION LEARNING
    Liu, Jiaxing
    Liu, Zhilei
    Wang, Longbiao
    Guo, Lili
    Dang, Jianwu
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7174 - 7178
  • [7] A Deep Learning Framework with Optimizations for Facial Expression and Emotion Recognition from Videos
    Nukathati, Ranjit Kumar
    Nagella, Uday Bhaskar
    Kumar, A. P. Siva
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2025, 16 (03) : 217 - 229
  • [8] Emotion Recognition and Classification of Film Reviews Based on Deep Learning and Multimodal Fusion
    Na, Risu
    Sun, Ning
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [9] Deep Learning-Based Emotion Recognition from Real-Time Videos
    Zhou, Wenbin
    Cheng, Justin
    Lei, Xingyu
    Benes, Bedrich
    Adamo, Nicoletta
    HUMAN-COMPUTER INTERACTION. MULTIMODAL AND NATURAL INTERACTION, HCI 2020, PT II, 2020, 12182 : 321 - 332
  • [10] A Deep-Learning-Based Multimodal Data Fusion Framework for Urban Region Function Recognition
    Yu, Mingyang
    Xu, Haiqing
    Zhou, Fangliang
    Xu, Shuai
    Yin, Hongling
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2023, 12 (12)