Question action relevance and editing for visual question answering

被引:11
|
作者
Toor, Andeep S. [1 ]
Wechsler, Harry [1 ]
Nappi, Michele [2 ]
机构
[1] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA
[2] Univ Salerno, Dipartimento Informat, Fisciano, Italy
关键词
Computer vision; Visual question answering; Deep learning; Action recognition; Image understanding; Question relevance;
D O I
10.1007/s11042-018-6097-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Question Answering (VQA) expands on the Turing Test, as it involves the ability to answer questions about visual content. Current efforts in VQA, however, still do not fully consider whether a question about visual content is relevant and if it is not, how to edit it best to make it answerable. Question relevance has only been considered so far at the level of a whole question using binary classification and without the capability to edit a question to make it grounded and intelligible. The only exception to this is our prior research effort into question part relevance that allows for relevance and editing based on object nouns. This paper extends previous work on object relevance to determine the relevance for a question action and leverage this capability to edit an irrelevant question to make it relevant. Practical applications of such a capability include answering biometric-related queries across a set of images, including people and their action (behavioral biometrics). The feasibility of our approach is shown using Context-Collaborative VQA (C2VQA) Action/Relevance/Edit (ARE). Our results show that our proposed approach outperforms all other models for the novel tasks of question action relevance (QAR) and question action editing (QAE) by a significant margin. The ultimate goal for future research is to address full-fledged W5 + type of inquires (What, Where, When, Why, Who, and How) that are grounded to and reference video using both nouns and verbs in a collaborative context-aware fashion.
引用
收藏
页码:2921 / 2935
页数:15
相关论文
共 50 条
  • [21] Debiased Visual Question Answering via the perspective of question types
    Huai, Tianyu
    Yang, Shuwen
    Zhang, Junhang
    Zhao, Jiabao
    He, Liang
    PATTERN RECOGNITION LETTERS, 2024, 178 : 181 - 187
  • [22] Multiple answers to a question: a new approach for visual question answering
    Sayedshayan Hashemi Hosseinabad
    Mehran Safayani
    Abdolreza Mirzaei
    The Visual Computer, 2021, 37 : 119 - 131
  • [23] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [24] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
  • [25] Robust Explanations for Visual Question Answering
    Patro, Badri N.
    Patel, Shivansh
    Namboodiri, Vinay P.
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1566 - 1575
  • [26] Visual Question Answering for Cultural Heritage
    Bongini, Pietro
    Becattini, Federico
    Bagdanov, Andrew D.
    Del Bimbo, Alberto
    INTERNATIONAL CONFERENCE FLORENCE HERI-TECH: THE FUTURE OF HERITAGE SCIENCE AND TECHNOLOGIES, 2020, 949
  • [27] Hierarchical reasoning based on perception action cycle for visual question answering
    Mohamud, Safaa Abdullahi Moallim
    Jalali, Amin
    Lee, Minho
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 241
  • [28] A novel feature extractor for human action recognition in visual question answering
    Silva, Francisco H. dos S.
    Bezerra, Gabriel M.
    Holanda, Gabriel B.
    de Souza, J. Wellington M.
    Rego, Paulo A. L.
    Neto, Aloisio V. Lira
    de Albuquerque, Victor Hugo C.
    Reboucas Filho, Pedro P.
    PATTERN RECOGNITION LETTERS, 2021, 147 : 41 - 47
  • [29] Differential Attention for Visual Question Answering
    Patro, Badri
    Namboodiri, Vinay P.
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688
  • [30] Structured Attentions for Visual Question Answering
    Zhu, Chen
    Zhao, Yanpeng
    Huang, Shuaiyi
    Tu, Kewei
    Ma, Yi
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1300 - 1309