Visual Question Answering using Explicit Visual Attention

被引:2
|
作者
Lioutas, Vasileios [1 ]
Passalis, Nikolaos [1 ]
Tefas, Anastasios [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
关键词
D O I
10.1109/ISCAS.2018.8351158
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
One of the most complex multi-model problems faced today is Visual Question Answering (VQA), which requires a machine to properly understand a question about a reference visual input, expressed in natural language, and then produce the answer to that question. In order to solve this problem and increase the probability of producing the correct answer, it is crucial to provide reliable attention information. However, existing methods only use implicitly trained attention models that are often unable to attend to the appropriate image region the question refers to, limiting their ability to provide the correct answer. To address this issue, we propose an explicitly trained attention model that is inspired by the theory of pictorial superiority effect. In this model, we use attention-oriented word embeddings that increase the efficiency of learning common representation spaces. The dataset that we use, the Visual7W dataset, is the only dataset that provides visual attention ground truth information. In this paper, we demonstrate the effectiveness of the proposed method over both implicit attention models and other state-of-art VQA techniques.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Explicit ensemble attention learning for improving visual question answering
    Lioutas, Vasileios
    Passalis, Nikolaos
    Tefas, Anastasios
    PATTERN RECOGNITION LETTERS, 2018, 111 : 51 - 57
  • [2] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
  • [3] Differential Attention for Visual Question Answering
    Patro, Badri
    Namboodiri, Vinay P.
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688
  • [4] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [5] Fusing Attention with Visual Question Answering
    Burt, Ryan
    Cudic, Mihael
    Principe, Jose C.
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 949 - 953
  • [6] Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering
    Guo, Zihan
    Han, Dezhi
    SENSORS, 2020, 20 (23) : 1 - 15
  • [7] Focal Visual-Text Attention for Visual Question Answering
    Liang, Junwei
    Jiang, Lu
    Cao, Liangliang
    Li, Li-Jia
    Hauptmann, Alexander
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6135 - 6143
  • [8] Question -Led object attention for visual question answering
    Gao, Lianli
    Cao, Liangfu
    Xu, Xing
    Shao, Jie
    Song, Jingkuan
    NEUROCOMPUTING, 2020, 391 : 227 - 233
  • [9] Question Type Guided Attention in Visual Question Answering
    Shi, Yang
    Furlanello, Tommaso
    Zha, Sheng
    Anandkumar, Animashree
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
  • [10] Question-Agnostic Attention for Visual Question Answering
    Farazi, Moshiur
    Khan, Salman
    Barnes, Nick
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549