Question action relevance and editing for visual question answering

被引:11
|
作者
Toor, Andeep S. [1 ]
Wechsler, Harry [1 ]
Nappi, Michele [2 ]
机构
[1] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA
[2] Univ Salerno, Dipartimento Informat, Fisciano, Italy
关键词
Computer vision; Visual question answering; Deep learning; Action recognition; Image understanding; Question relevance;
D O I
10.1007/s11042-018-6097-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Question Answering (VQA) expands on the Turing Test, as it involves the ability to answer questions about visual content. Current efforts in VQA, however, still do not fully consider whether a question about visual content is relevant and if it is not, how to edit it best to make it answerable. Question relevance has only been considered so far at the level of a whole question using binary classification and without the capability to edit a question to make it grounded and intelligible. The only exception to this is our prior research effort into question part relevance that allows for relevance and editing based on object nouns. This paper extends previous work on object relevance to determine the relevance for a question action and leverage this capability to edit an irrelevant question to make it relevant. Practical applications of such a capability include answering biometric-related queries across a set of images, including people and their action (behavioral biometrics). The feasibility of our approach is shown using Context-Collaborative VQA (C2VQA) Action/Relevance/Edit (ARE). Our results show that our proposed approach outperforms all other models for the novel tasks of question action relevance (QAR) and question action editing (QAE) by a significant margin. The ultimate goal for future research is to address full-fledged W5 + type of inquires (What, Where, When, Why, Who, and How) that are grounded to and reference video using both nouns and verbs in a collaborative context-aware fashion.
引用
收藏
页码:2921 / 2935
页数:15
相关论文
共 50 条
  • [41] VAQA: Visual Arabic Question Answering
    Kamel, Sarah M. M.
    Hassan, Shimaa I. I.
    Elrefaei, Lamiaa
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10803 - 10823
  • [42] Adapted GooLeNet for Visual Question Answering
    Huang, Jie
    Hu, Yue
    Yang, Weilong
    2018 3RD INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE), 2018, : 603 - 606
  • [43] VAQA: Visual Arabic Question Answering
    Sarah M. kamel
    Shimaa I. Hassan
    Lamiaa Elrefaei
    Arabian Journal for Science and Engineering, 2023, 48 : 10803 - 10823
  • [44] Scene Text Visual Question Answering
    Biten, Ali Furkan
    Tito, Ruben
    Mafla, Andres
    Gomez, Lluis
    Rusinol, Marcal
    Valveny, Ernest
    Jawahar, C. V.
    Karatzas, Dimosthenis
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4290 - 4300
  • [45] Semantically Guided Visual Question Answering
    Zhao, Handong
    Fan, Quanfu
    Gutfreund, Dan
    Fu, Yun
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1852 - 1860
  • [46] Multitask Learning for Visual Question Answering
    Ma, Jie
    Liu, Jun
    Lin, Qika
    Wu, Bei
    Wang, Yaxian
    You, Yang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1380 - 1394
  • [47] Visual Question Answering for Intelligent Interaction
    Gao, Panpan
    Sun, Hanxu
    Chen, Gang
    Wang, Ruiquan
    Li, Minggang
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [48] Differential Networks for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Li, Ruifan
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8997 - 9004
  • [49] Document Collection Visual Question Answering
    Tito, Ruben
    Karatzas, Dimosthenis
    Valveny, Ernest
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 778 - 792
  • [50] Fusing Attention with Visual Question Answering
    Burt, Ryan
    Cudic, Mihael
    Principe, Jose C.
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 949 - 953