ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING

被引:0
|
作者
Gu, Geonmo [1 ]
Kim, Seong Tae [1 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Image & Video Syst Lab, Daejeon, South Korea
基金
新加坡国家研究基金会;
关键词
Visual Question Answering; Visual attention; Textual attention; Adaptive fusion; Deep learning;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automatic understanding of the content of a reference image and natural language questions is needed in Visual Question Answering (VQA). Generating a visual attention map that focuses on the regions related to the context of the question can improve performance of VQA. In this paper, we propose adaptive attention-based VQA network. The proposed method utilizes the complementary information from the attention maps depending on three levels of word embedding (word level, phrase level, and question level embedding), and adaptively fuses the information to represent the image-question pair appropriately. Comparative experiments have been conducted on the public COCO-QA database to validate the proposed method. Experimental results have shown that the proposed method outperforms previous methods in terms of accuracy.
引用
收藏
页码:997 / 1002
页数:6
相关论文
共 50 条
  • [41] The multi-modal fusion in visual question answering: a review of attention mechanisms
    Lu, Siyu
    Liu, Mingzhe
    Yin, Lirong
    Yin, Zhengtong
    Liu, Xuan
    Zheng, Wenfeng
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [42] Visual Question Answering using Explicit Visual Attention
    Lioutas, Vasileios
    Passalis, Nikolaos
    Tefas, Anastasios
    2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [43] Advanced Visual and Textual Co-context Aware Attention Network with Dependent Multimodal Fusion Block for Visual Question Answering
    Asri H.S.
    Safabakhsh R.
    Multimedia Tools and Applications, 2024, 83 (40) : 87959 - 87986
  • [44] Guiding Visual Question Answering with Attention Priors
    Le, Thao Minh
    Le, Vuong
    Gupta, Sunil
    Venkatesh, Svetha
    Tran, Truyen
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4370 - 4379
  • [45] Re-Attention for Visual Question Answering
    Guo, Wenya
    Zhang, Ying
    Yang, Jufeng
    Yuan, Xiaojie
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6730 - 6743
  • [46] Re-Attention for Visual Question Answering
    Guo, Wenya
    Zhang, Ying
    Wu, Xiaoping
    Yang, Jufeng
    Cai, Xiangrui
    Yuan, Xiaojie
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 91 - 98
  • [47] DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram Question Answering
    Wang, Yaxian
    Wei, Bifan
    Liu, Jun
    Zhang, Lingling
    Wang, Jiaxin
    Wang, Qianying
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4812 - 4827
  • [48] Feature Enhancement in Attention for Visual Question Answering
    Lin, Yuetan
    Pang, Zhangyang
    Wang, Donghui
    Zhuang, Yueting
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4216 - 4222
  • [49] Dynamic Capsule Attention for Visual Question Answering
    Zhou, Yiyi
    Ji, Rongrong
    Su, Jinsong
    Sun, Xiaoshuai
    Chen, Weiqiu
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9324 - 9331
  • [50] Multi-Channel Co-Attention Network for Visual Question Answering
    Tian, Weidong
    He, Bin
    Wang, Nanxun
    Zhao, Zhongqiu
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,