Multi-Attention Fusion Network for Video-based Emotion Recognition

被引:23
|
作者
Wang, Yanan [1 ]
Wu, Jianming [1 ]
Hoashi, Keiichiro [1 ]
机构
[1] KDDI Res Inc, Saitama, Japan
关键词
Emotion recognition; Multimodal; Attention mechanism; Multimodal domain adaptation; Fusion network;
D O I
10.1145/3340555.3355720
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Humans routinely pay attention to important emotion information from visual and audio modalities without considering multimodal alignment issues, and recognize emotions by integrating important multimodal information at a certain interval. In this paper, we propose a multiple attention fusion network (MAFN) with the goal of improving emotion recognition performance by modeling human emotion recognition mechanisms. MAFN consists of two types of attention mechanisms: the intra-modality attention mechanism is applied to dynamically extract representative emotion features from a single modal frame sequences; the inter-modality attention mechanism is applied to automatically highlight specific modal features based on their importance. In addition, we define a multimodal domain adaptation method to have a positive effect on capturing interactions between modalities. MAFN achieved 58.65% recognition accuracy with the AFEW testing set, which is a significant improvement compared with the baseline of 41.07%.
引用
收藏
页码:595 / 601
页数:7
相关论文
共 50 条
  • [1] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
    Liu, Xiaodong
    Li, Songyang
    Wang, Miao
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [2] Multi-Attention Module for Dynamic Facial Emotion Recognition
    Zhi, Junnan
    Song, Tingting
    Yu, Kang
    Yuan, Fengen
    Wang, Huaqiang
    Hu, Guangyang
    Yang, Hao
    INFORMATION, 2022, 13 (05)
  • [3] CANet: Comprehensive Attention Network for video-based action recognition
    Gao, Xiong
    Chang, Zhaobin
    Ran, Xingcheng
    Lu, Yonggang
    KNOWLEDGE-BASED SYSTEMS, 2024, 296
  • [4] Diverse Features Fusion Network for video-based action recognition
    Deng, Haoyang
    Kong, Jun
    Jiang, Min
    Liu, Tianshan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 77
  • [5] Multi-Attention Network for Unsupervised Video Object Segmentation
    Zhang, Guifang
    Wong, Hon-Cheng
    Lo, Sio-Long
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 71 - 75
  • [6] MAIN: Multi-Attention Instance Network for video segmentation
    Alcazar, Juan Leon
    Bravo, Maria A.
    Jeanneret, Guillaume
    Thabet, Ali K.
    Brox, Thomas
    Arbelaez, Pablo
    Ghanem, Bernard
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 210
  • [7] Multi-Attention Convolutional Neural Network for Video Deblurring
    Zhang, Xiaoqin
    Wang, Tao
    Jiang, Runhua
    Zhao, Li
    Xu, Yuewang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 1986 - 1997
  • [8] STAN: spatiotemporal attention network for video-based facial expression recognition
    Yufan Yi
    Yiping Xu
    Ziyi Ye
    Linhui Li
    Xinli Hu
    Yan Tian
    The Visual Computer, 2023, 39 : 6205 - 6220
  • [9] STAN: spatiotemporal attention network for video-based facial expression recognition
    Yi, Yufan
    Xu, Yiping
    Ye, Ziyi
    Li, Linhui
    Hu, Xinli
    Tian, Yan
    VISUAL COMPUTER, 2023, 39 (12): : 6205 - 6220
  • [10] Video-based emotion recognition in the wild using deep transfer learning and score fusion
    Kaya, Heysem
    Gurpinar, Furkan
    Salah, Albert Ali
    IMAGE AND VISION COMPUTING, 2017, 65 : 66 - 75