Dual-feature collaborative relation-attention networks for visual question answering

被引:1
|
作者
Yao, Lu [1 ]
Yang, You [1 ,2 ]
Hu, Juntao [1 ]
机构
[1] Chongqing Normal Univ, Sch Comp & Informat Sci, Chongqing 401331, Peoples R China
[2] Natl Ctr Appl Math Chongqing, Chongqing 401331, Peoples R China
关键词
Visual question answering; Region feature; Grid feature; Relation attention; Positional encoding;
D O I
10.1007/s13735-023-00283-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Region and grid features extracted by object detection networks, which contain abundant image information, are widely used in visual question answering (VQA). The regions focus on object-level features, but the grids are better at representing contextual information and fine-grained attributes of images. However, most of the existing VQA models process visual information with one-way attention, failing to capture the internal relations between objects and analyze the feature details. In this work, we propose a novel multi-level collaborative decoder (MLCD) layer based on the encoder-decoder framework to address this issue, which incorporates visual location vectors into attention. Specifically, each MLCD is equipped with three different attention-MLP sub-modules to progressively and accurately mine the intrinsic interactions of features and enhance the influence of image content on prediction results. Additionally, to fully exploit the respective advantages of two features, we propose a novel relativity-augmented cross-attention (RACA) unit and add it to MLCD, in which the features after simple attention are complementarily augmented using global information and self-attributes. To validate the proposed methods, we stack the MLCD layer deeply to constitute our dual-feature collaborative relation-attention network (DFCRAN). We conduct extensive experiments and visualize the results on three benchmark datasets, including COCO-QA, VQA 1.0, and VQA 2.0, to prove the effectiveness of our model and achieve competitive performances compared to the state-of-the-art single models without pre-training.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Dual-feature collaborative relation-attention networks for visual question answering
    Lu Yao
    You Yang
    Juntao Hu
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [2] Feature Enhancement in Attention for Visual Question Answering
    Lin, Yuetan
    Pang, Zhangyang
    Wang, Donghui
    Zhuang, Yueting
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4216 - 4222
  • [3] Feature Fusion Attention Visual Question Answering
    Wang, Chunlin
    Sun, Jianyong
    Chen, Xiaolin
    ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 412 - 416
  • [4] Dual Self-Guided Attention with Sparse Question Networks for Visual Question Answering
    Shen, Xiang
    Han, Dezhi
    Chang, Chin-Chen
    Zong, Liang
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (04) : 785 - 796
  • [5] Dual self-attention with co-attention networks for visual question answering
    Liu, Yun
    Zhang, Xiaoming
    Zhang, Qianyun
    Li, Chaozhuo
    Huang, Feiran
    Tang, Xianghong
    Li, Zhoujun
    PATTERN RECOGNITION, 2021, 117 (117)
  • [6] Collaborative Attention Network to Enhance Visual Question Answering
    Gu, Rui
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
  • [7] Dual Attention and Question Categorization-Based Visual Question Answering
    Mishra A.
    Anand A.
    Guha P.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 81 - 91
  • [8] Multi-modal co-attention relation networks for visual question answering
    Zihan Guo
    Dezhi Han
    The Visual Computer, 2023, 39 : 5783 - 5795
  • [9] Multi-modal co-attention relation networks for visual question answering
    Guo, Zihan
    Han, Dezhi
    VISUAL COMPUTER, 2023, 39 (11): : 5783 - 5795
  • [10] Dual-Branch Collaborative Learning for Visual Question Answering
    Tian, Weidong
    Zhao, Junxiang
    Xu, Wenzheng
    Zhao, Zhongqiu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14864 : 96 - 107