ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING

被引:0
|
作者
Gu, Geonmo [1 ]
Kim, Seong Tae [1 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Image & Video Syst Lab, Daejeon, South Korea
基金
新加坡国家研究基金会;
关键词
Visual Question Answering; Visual attention; Textual attention; Adaptive fusion; Deep learning;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automatic understanding of the content of a reference image and natural language questions is needed in Visual Question Answering (VQA). Generating a visual attention map that focuses on the regions related to the context of the question can improve performance of VQA. In this paper, we propose adaptive attention-based VQA network. The proposed method utilizes the complementary information from the attention maps depending on three levels of word embedding (word level, phrase level, and question level embedding), and adaptively fuses the information to represent the image-question pair appropriately. Comparative experiments have been conducted on the public COCO-QA database to validate the proposed method. Experimental results have shown that the proposed method outperforms previous methods in terms of accuracy.
引用
收藏
页码:997 / 1002
页数:6
相关论文
共 50 条
  • [31] ARDN: Attention Re-distribution Network for Visual Question Answering
    Yi, Jinyang
    Han, Dezhi
    Chen, Chongqing
    Shen, Xiang
    Zong, Liang
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024,
  • [32] Co-attention graph convolutional network for visual question answering
    Liu, Chuan
    Tan, Ying-Ying
    Xia, Tian-Tian
    Zhang, Jiajing
    Zhu, Ming
    MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
  • [33] CAAN: Context-Aware attention network for visual question answering
    Chen, Chongqing
    Han, Dezhi
    Chang, Chin -Chen
    PATTERN RECOGNITION, 2022, 132
  • [34] Co-attention graph convolutional network for visual question answering
    Chuan Liu
    Ying-Ying Tan
    Tian-Tian Xia
    Jiajing Zhang
    Ming Zhu
    Multimedia Systems, 2023, 29 : 2527 - 2543
  • [35] GFSNet: Gaussian Fourier with sparse attention network for visual question answering
    Shen, Xiang
    Han, Dezhi
    Chang, Chin-Chen
    Oad, Ammar
    Wu, Huafeng
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (06)
  • [36] Relation-Aware Graph Attention Network for Visual Question Answering
    Li, Linjie
    Gan, Zhe
    Cheng, Yu
    Liu, Jingjing
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10312 - 10321
  • [37] Path-Wise Attention Memory Network for Visual Question Answering
    Xiang, Yingxin
    Zhang, Chengyuan
    Han, Zhichao
    Yu, Hao
    Li, Jiaye
    Zhu, Lei
    MATHEMATICS, 2022, 10 (18)
  • [38] Question -Led object attention for visual question answering
    Gao, Lianli
    Cao, Liangfu
    Xu, Xing
    Shao, Jie
    Song, Jingkuan
    NEUROCOMPUTING, 2020, 391 : 227 - 233
  • [39] Question-Agnostic Attention for Visual Question Answering
    Farazi, Moshiur
    Khan, Salman
    Barnes, Nick
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549
  • [40] Question Type Guided Attention in Visual Question Answering
    Shi, Yang
    Furlanello, Tommaso
    Zha, Sheng
    Anandkumar, Animashree
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175