ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING

被引:0
|
作者
Gu, Geonmo [1 ]
Kim, Seong Tae [1 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Image & Video Syst Lab, Daejeon, South Korea
基金
新加坡国家研究基金会;
关键词
Visual Question Answering; Visual attention; Textual attention; Adaptive fusion; Deep learning;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automatic understanding of the content of a reference image and natural language questions is needed in Visual Question Answering (VQA). Generating a visual attention map that focuses on the regions related to the context of the question can improve performance of VQA. In this paper, we propose adaptive attention-based VQA network. The proposed method utilizes the complementary information from the attention maps depending on three levels of word embedding (word level, phrase level, and question level embedding), and adaptively fuses the information to represent the image-question pair appropriately. Comparative experiments have been conducted on the public COCO-QA database to validate the proposed method. Experimental results have shown that the proposed method outperforms previous methods in terms of accuracy.
引用
收藏
页码:997 / 1002
页数:6
相关论文
共 50 条
  • [1] MDAnet: Multiple Fusion Network with Double Attention for Visual Question Answering
    Feng, Junyi
    Gong, Ping
    Qiu, Guanghui
    ICVIP 2019: PROCEEDINGS OF 2019 3RD INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING, 2019, : 143 - 147
  • [2] Feature Fusion Attention Visual Question Answering
    Wang, Chunlin
    Sun, Jianyong
    Chen, Xiaolin
    ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 412 - 416
  • [3] Multi-Modality Global Fusion Attention Network for Visual Question Answering
    Yang, Cheng
    Wu, Weijia
    Wang, Yuxing
    Zhou, Hong
    ELECTRONICS, 2020, 9 (11) : 1 - 12
  • [4] An Adaptive Multimodal Fusion Network Based on Multilinear Gradients for Visual Question Answering
    Zhao, Chengfang
    Tang, Mingwei
    Zheng, Yanxi
    Ran, Chaocong
    ELECTRONICS, 2025, 14 (01):
  • [5] Relational reasoning and adaptive fusion for visual question answering
    Shen, Xiang
    Han, Dezhi
    Zong, Liang
    Guo, Zihan
    Hua, Jie
    APPLIED INTELLIGENCE, 2024, 54 (06) : 5062 - 5080
  • [6] Triple attention network for sentimental visual question answering
    Ruwa, Nelson
    Mao, Qirong
    Song, Heping
    Jia, Hongjie
    Dong, Ming
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 189
  • [7] Collaborative Attention Network to Enhance Visual Question Answering
    Gu, Rui
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
  • [8] Fair Attention Network for Robust Visual Question Answering
    Bi, Yandong
    Jiang, Huajie
    Hu, Yongli
    Sun, Yanfeng
    Yin, Baocai
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 7870 - 7881
  • [9] Co-Attention Network With Question Type for Visual Question Answering
    Yang, Chao
    Jiang, Mengqi
    Jiang, Bin
    Zhou, Weixin
    Li, Keqin
    IEEE ACCESS, 2019, 7 : 40771 - 40781
  • [10] Cross-Modal Multistep Fusion Network With Co-Attention for Visual Question Answering
    Lao, Mingrui
    Guo, Yanming
    Wang, Hui
    Zhang, Xin
    IEEE ACCESS, 2018, 6 : 31516 - 31524