Multi-Attention Fusion Network for Video-based Emotion Recognition

被引:23
|
作者
Wang, Yanan [1 ]
Wu, Jianming [1 ]
Hoashi, Keiichiro [1 ]
机构
[1] KDDI Res Inc, Saitama, Japan
关键词
Emotion recognition; Multimodal; Attention mechanism; Multimodal domain adaptation; Fusion network;
D O I
10.1145/3340555.3355720
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Humans routinely pay attention to important emotion information from visual and audio modalities without considering multimodal alignment issues, and recognize emotions by integrating important multimodal information at a certain interval. In this paper, we propose a multiple attention fusion network (MAFN) with the goal of improving emotion recognition performance by modeling human emotion recognition mechanisms. MAFN consists of two types of attention mechanisms: the intra-modality attention mechanism is applied to dynamically extract representative emotion features from a single modal frame sequences; the inter-modality attention mechanism is applied to automatically highlight specific modal features based on their importance. In addition, we define a multimodal domain adaptation method to have a positive effect on capturing interactions between modalities. MAFN achieved 58.65% recognition accuracy with the AFEW testing set, which is a significant improvement compared with the baseline of 41.07%.
引用
收藏
页码:595 / 601
页数:7
相关论文
共 50 条
  • [31] M3ANet: Multi-Modal and Multi-Attention Fusion Network for Ship License Plate Recognition
    Zhou, Chunyi
    Liu, Dekang
    Wang, Tianlei
    Tian, Jiangmin
    Cao, Jiuwen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5976 - 5986
  • [32] Multi-attention guided feature fusion network for salient object detection
    Li, Anni
    Qi, JinQing
    Lu, Huchuan
    NEUROCOMPUTING, 2020, 411 : 416 - 427
  • [33] Domain adaptation based on feature fusion and multi-attention mechanism*
    Wang, Tiansheng
    Liu, Zhonghua
    Ou, Weihua
    Huo, Hua
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 108
  • [34] Ingredient-Guided Cascaded Multi-Attention Network for Food Recognition
    Min, Weiqing
    Liu, Linhu
    Luo, Zhengdong
    Jiang, Shuqiang
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1331 - 1339
  • [35] Video-Based Emotion Recognition in the Wild for Online Education Systems
    Mai, Genting
    Guo, Zijian
    She, Yicong
    Wang, Hongni
    Liang, Yan
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2022, 13631 : 516 - 529
  • [36] EEG-fNIRS emotion recognition based on multi-brain attention mechanism capsule fusion network
    Liu, Yue
    Zhang, Xueying
    Chen, Guijun
    Huang, Lixia
    Sun, Ying
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (11): : 2247 - 2257
  • [37] Residual attention fusion network for video action recognition
    Li, Ao
    Yi, Yang
    Liang, Daan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [38] Audio and Video-based Emotion Recognition using Multimodal Transformers
    John, Vijay
    Kawanishi, Yasutomo
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2582 - 2588
  • [39] Self-guided Multi-attention Network for Periventricular Leukomalacia Recognition
    Wang, Zhuochen
    Huang, Tingting
    Xiao, Bin
    Huo, Jiayu
    Wang, Sheng
    Jiang, Haoxiang
    Liu, Heng
    Wu, Fan
    Zhou, Xiang
    Xue, Zhong
    Yang, Jian
    Wang, Qian
    PREDICTIVE INTELLIGENCE IN MEDICINE, PRIME 2021, 2021, 12928 : 128 - 137
  • [40] A multidimensional feature fusion network based on MGSE and TAAC for video-based human action recognition
    Zhou, Shuang
    Xu, Hongji
    Bai, Zhiquan
    Du, Zhengfeng
    Zeng, Jiaqi
    Wang, Yang
    Wang, Yuhao
    Li, Shijie
    Wang, Mengmeng
    Li, Yiran
    Li, Jianjun
    Xu, Jie
    NEURAL NETWORKS, 2023, 168 : 496 - 507