ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING

被引：0

作者：

Gu, Geonmo ^{[1
]}

Kim, Seong Tae ^{[1
]}

Ro, Yong Man ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Image & Video Syst Lab, Daejeon, South Korea

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME) | 2017年

基金：

新加坡国家研究基金会;

关键词：

Visual Question Answering; Visual attention; Textual attention; Adaptive fusion; Deep learning;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Automatic understanding of the content of a reference image and natural language questions is needed in Visual Question Answering (VQA). Generating a visual attention map that focuses on the regions related to the context of the question can improve performance of VQA. In this paper, we propose adaptive attention-based VQA network. The proposed method utilizes the complementary information from the attention maps depending on three levels of word embedding (word level, phrase level, and question level embedding), and adaptively fuses the information to represent the image-question pair appropriately. Comparative experiments have been conducted on the public COCO-QA database to validate the proposed method. Experimental results have shown that the proposed method outperforms previous methods in terms of accuracy.

引用

页码：997 / 1002

页数：6

共 50 条

[41] The multi-modal fusion in visual question answering: a review of attention mechanisms
Lu, Siyu
Liu, Mingzhe
Yin, Lirong
Yin, Zhengtong
Liu, Xuan
Zheng, Wenfeng
PEERJ COMPUTER SCIENCE, 2023, 9
[42] Visual Question Answering using Explicit Visual Attention
Lioutas, Vasileios
Passalis, Nikolaos
Tefas, Anastasios
2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
[43] Advanced Visual and Textual Co-context Aware Attention Network with Dependent Multimodal Fusion Block for Visual Question Answering
Asri H.S.
Safabakhsh R.
Multimedia Tools and Applications, 2024, 83 (40) : 87959 - 87986
[44] Guiding Visual Question Answering with Attention Priors
Le, Thao Minh
Le, Vuong
Gupta, Sunil
Venkatesh, Svetha
Tran, Truyen
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4370 - 4379
[45] Re-Attention for Visual Question Answering
Guo, Wenya
Zhang, Ying
Yang, Jufeng
Yuan, Xiaojie
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6730 - 6743
[46] Re-Attention for Visual Question Answering
Guo, Wenya
Zhang, Ying
Wu, Xiaoping
Yang, Jufeng
Cai, Xiangrui
Yuan, Xiaojie
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 91 - 98
[47] DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram Question Answering
Wang, Yaxian
Wei, Bifan
Liu, Jun
Zhang, Lingling
Wang, Jiaxin
Wang, Qianying
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4812 - 4827
[48] Feature Enhancement in Attention for Visual Question Answering
Lin, Yuetan
Pang, Zhangyang
Wang, Donghui
Zhuang, Yueting
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4216 - 4222
[49] Dynamic Capsule Attention for Visual Question Answering
Zhou, Yiyi
Ji, Rongrong
Su, Jinsong
Sun, Xiaoshuai
Chen, Weiqiu
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9324 - 9331
[50] Multi-Channel Co-Attention Network for Visual Question Answering
Tian, Weidong
He, Bin
Wang, Nanxun
Zhao, Zhongqiu
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 4 5 →