Multimodal Bi-direction Guided Attention Networks for Visual Question Answering

被引:0
|
作者
Cai, Linqin [1 ]
Xu, Nuoying [1 ]
Tian, Hang [1 ]
Chen, Kejia [2 ]
Fan, Haodu [1 ]
机构
[1] Chongqing Univ Posts & Telecommun, Res Ctr Artificial Intelligence & Smart Educ, Chongqing 400065, Peoples R China
[2] Chengdu Huawei Technol Co Ltd, Chengdu 500643, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; Attention mechanism; Position attention; Deep learning; FUSION; KNOWLEDGE;
D O I
10.1007/s11063-023-11403-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current visual question answering (VQA) has become a research hotspot in the computer vision and natural language processing field. A core solution of VQA is how to fuse multi-modal features from images and questions. This paper proposes a Multimodal Bi-direction Guided Attention Network (MBGAN) for VQA by combining visual relationships and attention to achieve more refined feature fusion. Specifically, the self-attention is used to extract image features and text features, the guided-attention is applied to obtain the correlation between each image area and the related question. To obtain the relative position relationship of different objects, position attention is further introduced to realize relationship correlation modeling and enhance the matching ability of multi-modal features. Given an image and a natural language question, the proposed MBGAN learns visual relation inference and question attention networks in parallel to achieve the fine-grained fusion of the visual features and the textual features, then the final answers can be obtained accurately through model stacking. MBGAN achieves 69.41% overall accuracy on the VQA-v1 dataset, 70.79% overall accuracy on the VQA-v2 dataset, and 68.79% overall accuracy on the COCO-QA dataset, which shows that the proposed MBGAN outperforms most of the state-of-the-art models.
引用
收藏
页码:11921 / 11943
页数:23
相关论文
共 50 条
  • [21] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
  • [22] Differential Attention for Visual Question Answering
    Patro, Badri
    Namboodiri, Vinay P.
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688
  • [23] Fusing Attention with Visual Question Answering
    Burt, Ryan
    Cudic, Mihael
    Principe, Jose C.
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 949 - 953
  • [24] Semantically Guided Visual Question Answering
    Zhao, Handong
    Fan, Quanfu
    Gutfreund, Dan
    Fu, Yun
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1852 - 1860
  • [25] Hierarchical Attention Networks for Fact-based Visual Question Answering
    Haibo Yao
    Yongkang Luo
    Zhi Zhang
    Jianhang Yang
    Chengtao Cai
    Multimedia Tools and Applications, 2024, 83 : 17281 - 17298
  • [26] Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering
    Jiang, Ai-Wen
    Liu, Bo
    Wang, Ming-Wen
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (04) : 738 - 748
  • [27] Deep Multimodal Reinforcement Network with Contextually Guided Recurrent Attention for Image Question Answering
    Ai-Wen Jiang
    Bo Liu
    Ming-Wen Wang
    Journal of Computer Science and Technology, 2017, 32 : 738 - 748
  • [28] Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
    Xu, Huijuan
    Saenko, Kate
    COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 : 451 - 466
  • [29] An Effective Dense Co-Attention Networks for Visual Question Answering
    He, Shirong
    Han, Dezhi
    SENSORS, 2020, 20 (17) : 1 - 15
  • [30] Hierarchical Attention Networks for Fact-based Visual Question Answering
    Yao, Haibo
    Luo, Yongkang
    Zhang, Zhi
    Yang, Jianhang
    Cai, Chengtao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (06) : 17281 - 17298