Multi-Attention Fusion Network for Video-based Emotion Recognition

被引:23
|
作者
Wang, Yanan [1 ]
Wu, Jianming [1 ]
Hoashi, Keiichiro [1 ]
机构
[1] KDDI Res Inc, Saitama, Japan
关键词
Emotion recognition; Multimodal; Attention mechanism; Multimodal domain adaptation; Fusion network;
D O I
10.1145/3340555.3355720
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Humans routinely pay attention to important emotion information from visual and audio modalities without considering multimodal alignment issues, and recognize emotions by integrating important multimodal information at a certain interval. In this paper, we propose a multiple attention fusion network (MAFN) with the goal of improving emotion recognition performance by modeling human emotion recognition mechanisms. MAFN consists of two types of attention mechanisms: the intra-modality attention mechanism is applied to dynamically extract representative emotion features from a single modal frame sequences; the inter-modality attention mechanism is applied to automatically highlight specific modal features based on their importance. In addition, we define a multimodal domain adaptation method to have a positive effect on capturing interactions between modalities. MAFN achieved 58.65% recognition accuracy with the AFEW testing set, which is a significant improvement compared with the baseline of 41.07%.
引用
收藏
页码:595 / 601
页数:7
相关论文
共 50 条
  • [21] HEROES: A Video-Based Human Emotion Recognition Database
    Mannocchi, Ilaria
    Lamichhane, Kamal
    Carli, Marco
    Battisti, Federica
    2022 10TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP), 2022,
  • [22] DUAL FOCUS ATTENTION NETWORK FOR VIDEO EMOTION RECOGNITION
    Qiu, Haonan
    He, Liang
    Wang, Feng
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [23] A transformer-encoder-based multimodal multi-attention fusion network for sentiment analysis
    Liu, Cong
    Wang, Yong
    Yang, Jing
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 8415 - 8441
  • [24] A small object detection model for drone images based on multi-attention fusion network
    Hu, Jie
    Pang, Ting
    Peng, Bo
    Shi, Yongguo
    Li, Tianrui
    IMAGE AND VISION COMPUTING, 2025, 155
  • [25] Video-based Emotion Recognition Using Multi-dichotomy RNN-DNN
    Ren, Taorui
    Ruan, Huabin
    Han, Wenjing
    Yang, Tao
    Jiang, Dongmei
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [26] A multi-scale multi-attention network for dynamic facial expression recognition
    Xiaohan Xia
    Le Yang
    Xiaoyong Wei
    Hichem Sahli
    Dongmei Jiang
    Multimedia Systems, 2022, 28 : 479 - 493
  • [27] A multi-scale multi-attention network for dynamic facial expression recognition
    Xia, Xiaohan
    Yang, Le
    Wei, Xiaoyong
    Sahli, Hichem
    Jiang, Dongmei
    MULTIMEDIA SYSTEMS, 2022, 28 (02) : 479 - 493
  • [28] Fusion of ConvLSTM and Multi-Attention Mechanism Network for Hyperspectral Image Classification
    Tang Ting
    Xin, Pan
    Luo Xiao-ling
    Gao Xiao-jing
    SPECTROSCOPY AND SPECTRAL ANALYSIS, 2023, 43 (08) : 2608 - 2616
  • [29] MCIENet: A multi-attention context information fusion eccentric segmentation network
    Jia, Shunyuan
    Leng, Lin
    Fan, Jingyuan
    Pan, Xiang
    PROCEEDINGS OF 2024 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND INTELLIGENT COMPUTING, BIC 2024, 2024, : 449 - 454
  • [30] Multi-Attention Audio-Visual Fusion Network for Audio Spatialization
    Zhang, Wen
    Shao, Jie
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 394 - 401