ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING

被引:0
|
作者
Gu, Geonmo [1 ]
Kim, Seong Tae [1 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Image & Video Syst Lab, Daejeon, South Korea
基金
新加坡国家研究基金会;
关键词
Visual Question Answering; Visual attention; Textual attention; Adaptive fusion; Deep learning;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automatic understanding of the content of a reference image and natural language questions is needed in Visual Question Answering (VQA). Generating a visual attention map that focuses on the regions related to the context of the question can improve performance of VQA. In this paper, we propose adaptive attention-based VQA network. The proposed method utilizes the complementary information from the attention maps depending on three levels of word embedding (word level, phrase level, and question level embedding), and adaptively fuses the information to represent the image-question pair appropriately. Comparative experiments have been conducted on the public COCO-QA database to validate the proposed method. Experimental results have shown that the proposed method outperforms previous methods in terms of accuracy.
引用
收藏
页码:997 / 1002
页数:6
相关论文
共 50 条
  • [21] Improving Visual Question Answering by Multimodal Gate Fusion Network
    Xiang, Shenxiang
    Chen, Qiaohong
    Fang, Xian
    Guo, Menghao
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [22] Adaptive sparse triple convolutional attention for enhanced visual question answering
    Wang, Ronggui
    Chen, Hong
    Yang, Juan
    Xue, Lixia
    VISUAL COMPUTER, 2025,
  • [23] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
  • [24] Differential Attention for Visual Question Answering
    Patro, Badri
    Namboodiri, Vinay P.
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688
  • [25] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [26] Fusing Attention with Visual Question Answering
    Burt, Ryan
    Cudic, Mihael
    Principe, Jose C.
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 949 - 953
  • [27] Modular dual-stream visual fusion network for visual question answering
    Xue, Lixia
    Wang, Wenhao
    Wang, Ronggui
    Yang, Juan
    VISUAL COMPUTER, 2025, 41 (01): : 549 - 562
  • [28] Two-step Joint Attention Network for Visual Question Answering
    Zhang, Weiming
    Zhang, Chunhong
    Liu, Pei
    Zhan, Zhiqiang
    Qiu, Xiaofeng
    2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM), 2017, : 136 - 143
  • [29] CAAN: Context-Aware attention network for visual question answering
    Chen, Chongqing
    Han, Dezhi
    Chang, Chin-Chen
    Pattern Recognition, 2022, 132
  • [30] Mutual Attention Inception Network for Remote Sensing Visual Question Answering
    Zheng, Xiangtao
    Wang, Binqiang
    Du, Xingqian
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60