An Empirical Evaluation of Visual Question Answering for Novel Objects

被引:8
|
作者
Ramakrishnan, Santhosh K. [1 ,2 ]
Pal, Ambar [1 ]
Sharma, Gaurav [1 ]
Mittal, Anurag [2 ]
机构
[1] IIT Kanpur, Kanpur, Uttar Pradesh, India
[2] IIT Madras, Chennai, Tamil Nadu, India
关键词
D O I
10.1109/CVPR.2017.773
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of answering questions about images in the harder setting, where the test questions and corresponding images contain novel objects, which were not queried about in the training data. Such setting is inevitable in real world-owing to the heavy tailed distribution of the visual categories, there would be some objects which would not be annotated in the train set. We show that the performance of two popular existing methods drop significantly (up to 28%) when evaluated on novel objects cf. known objects. We propose methods which use large existing external corpora of (i) unlabeled text, i.e. books, and (ii) images tagged with classes, to achieve novel object based visual question answering. We do systematic empirical studies, for both an oracle case where the novel objects are known textually, as well as a fully automatic case without any explicit knowledge of the novel objects, but with the minimal assumption that the novel objects are semantically related to the existing objects in training. The proposed methods for novel object based visual question answering are modular and can potentially be used with many visual question answering architectures. We show consistent improvements with the two popular architectures and give qualitative analysis of the cases where the model does well and of those where it fails to bring improvements.
引用
收藏
页码:7312 / 7321
页数:10
相关论文
共 50 条
  • [1] Fusion of Detected Objects in Text for Visual Question Answering
    Alberti, Chris
    Ling, Jeffrey
    Collins, Michael
    Reitter, David
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2131 - 2140
  • [2] From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
    Song, Jingkuan
    Zeng, Pengpeng
    Gao, Lianli
    Shen, Heng Tao
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 906 - 912
  • [3] An Empirical Study on the Language Modal in Visual Question Answering
    Peng, Daowan
    Wei, Wei
    Mao, Xian-Ling
    Fu, Yuanyuan
    Chen, Dangyang
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4109 - 4117
  • [4] A Novel Online Teaching Effect Evaluation Model Based on Visual Question Answering
    Cui, Yanqing
    Han, Guangjie
    Zhu, Hongbo
    JOURNAL OF INTERNET TECHNOLOGY, 2022, 23 (01): : 91 - 98
  • [5] Empirical study on using adapters for debiased Visual Question Answering
    Cho, Jae Won
    Argaw, Dawit Mureja
    Oh, Youngtaek
    Kim, Dong-Jin
    Kweon, In So
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 237
  • [6] Visual Question Answering
    Nada, Ahmed
    Chen, Min
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
  • [7] Enhancing Visual Question Answering: A Novel Approach to Determining Question Relevance
    Mishra, Atul
    Phutela, Nishtrha
    2024 5TH INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN INFORMATION TECHNOLOGY, ICITIIT 2024, 2024,
  • [8] Separating Skills and Concepts for Novel Visual Question Answering
    Whitehead, Spencer
    Wu, Hui
    Ji, Heng
    Feris, Rogerio
    Saenko, Kate
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5628 - 5637
  • [9] Question Modifiers in Visual Question Answering
    Britton, William
    Sarkhel, Somdeb
    Venugopal, Deepak
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
  • [10] An Empirical Study of Multilingual Scene-Text Visual Question Answering
    Li, Lin
    Zhang, Haohan
    Fang, Zeqing
    PROCEEDINGS OF THE 2ND WORKSHOP ON USER-CENTRIC NARRATIVE SUMMARIZATION OF LONG VIDEOS, NARSUM 2023, 2023, : 3 - 8