AI-VQA: Visual Question Answering based on Agent Interaction with Interpretability

被引:3
|
作者
Li, Rengang [1 ,2 ]
Xu, Cong [1 ,2 ]
Guo, Zhenhua [3 ]
Fan, Baoyu [3 ]
Zhang, Runze [3 ]
Liu, Wei [3 ]
Zhao, Yaqian [3 ]
Gong, Weifeng [3 ]
Wang, Endong [3 ]
机构
[1] Inspur Beijing Elect Informat Ind Co Ltd, Beijing, Peoples R China
[2] State Key Lab High End Server & Storage Technol, Jinan, Peoples R China
[3] Inspur Elect Informat Ind Co Ltd, State Key Lab High End Server & Storage Technol, Jinan, Peoples R China
关键词
dataset; visual question answer; vision and language;
D O I
10.1145/3503161.3548387
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Visual Question Answering (VQA) serves as a proxy for evaluating the scene understanding of an intelligent agent by answering questions about images. Most VQA benchmarks to date are focused on those questions that can be answered through understanding visual content in the scene, such as simple counting, visual attributes, and even a little challenging questions that require extra encyclopedic knowledge. However, humans have a remarkable capacity to reason dynamic interaction on the scene, which is beyond the literal content of an image and has not been investigated so far. In this paper, we propose Agent Interaction Visual Question Answering (AI-VQA), a task investigating deep scene understanding if the agent takes a certain action. For this task, a model not only needs to answer action-related questions but also to locate the objects in which the interaction occurs for guaranteeing it truly comprehends the action. Accordingly, we make a new dataset based on Visual Genome and ATOMIC knowledge graph, including more than 19,000 manually annotated questions, and will make it publicly available. Besides, we also provide an annotation of the reasoning path while developing the answer for each question. Based on the dataset, we further propose a novel method, called ARE, that can comprehend the interaction and explain the reason based on a given event knowledge base. Experimental results show that our proposed method outperforms the baseline by a clear margin.
引用
收藏
页码:5274 / 5282
页数:9
相关论文
共 50 条
  • [1] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [2] VQA: Visual Question Answering
    Agrawal, Aishwarya
    Lu, Jiasen
    Antol, Stanislaw
    Mitchell, Margaret
    Zitnick, C. Lawrence
    Parikh, Devi
    Batra, Dhruv
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
  • [3] VC-VQA: VISUAL CALIBRATION MECHANISM FOR VISUAL QUESTION ANSWERING
    Qiao, Yanyuan
    Yu, Zheng
    Liu, Jing
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1481 - 1485
  • [4] R-VQA: A robust visual question answering model
    Chowdhury, Souvik
    Soni, Badal
    KNOWLEDGE-BASED SYSTEMS, 2025, 309
  • [5] CQ-VQA: Visual Question Answering on Categorized Questions
    Mishra, Aakansha
    Anand, Ashish
    Guha, Prithwijit
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] Increasing Interpretability in Outside Knowledge Visual Question Answering
    Upravitelev, Max
    Krauss, Christopher
    Kuhlmann, Isabelle
    KNOWLEDGE MANAGEMENT IN ORGANISATIONS, KMO 2024, 2024, 2152 : 319 - 330
  • [7] CS-VQA: VISUAL QUESTION ANSWERING WITH COMPRESSIVELY SENSED IMAGES
    Huang, Li-Chi
    Kulkarni, Kuldeep
    Jha, Anik
    Lohit, Suhas
    Jayasuriya, Suren
    Turaga, Pavan
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 1283 - 1287
  • [8] Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool
    Liu, Feng
    Xiang, Tao
    Hospedales, Timothy M.
    Yang, Wankou
    Sun, Changyin
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 460 - 474
  • [9] VQA as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering
    Narayanan, Abhishek
    Rao, Abijna
    Prasad, Abhishek
    Natarajan, S.
    IMAGE AND VISION COMPUTING, 2021, 116
  • [10] SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering
    Xiong, Peixi
    You, Quanzeng
    Yu, Pei
    Liu, Zicheng
    Wu, Ying
    arXiv, 2022,