RUBi: Reducing Unimodal Biases for Visual Question Answering

被引:0
|
作者
Cadene, Remi [1 ]
Dancette, Corentin [1 ]
Ben-Younes, Hedi [1 ]
Cord, Matthieu [1 ]
Parikh, Devi [2 ,3 ]
机构
[1] Sorbonne Univ, CNRS, LIP6, 4 Pl Jussieu, F-75005 Paris, France
[2] Facebook AI Res, Menlo Pk, CA 94025 USA
[3] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training. Our code is available: github.com/cdancette/rubi.bootstrap.pytorch
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Medical visual question answering: A survey
    Lin, Zhihong
    Zhang, Donghao
    Tao, Qingyi
    Shi, Danli
    Haffari, Gholamreza
    Wu, Qi
    He, Mingguang
    Ge, Zongyuan
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 143
  • [32] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [33] Visual Question Answering as Reading Comprehension
    Li, Hui
    Wang, Peng
    Shen, Chunhua
    van den Hengel, Anton
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6312 - 6321
  • [34] Revisiting Visual Question Answering Baselines
    Jabri, Allan
    Joulin, Armand
    van der Maaten, Laurens
    COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 727 - 739
  • [35] Answer Distillation for Visual Question Answering
    Fang, Zhiwei
    Liu, Jing
    Tang, Qu
    Li, Yong
    Lu, Hanqing
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 72 - 87
  • [36] iVQA: Inverse Visual Question Answering
    Liu, Feng
    Xiang, Tao
    Hospedales, Timothy M.
    Yang, Wankou
    Sun, Changyin
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8611 - 8619
  • [37] VAQA: Visual Arabic Question Answering
    Kamel, Sarah M. M.
    Hassan, Shimaa I. I.
    Elrefaei, Lamiaa
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10803 - 10823
  • [38] Adapted GooLeNet for Visual Question Answering
    Huang, Jie
    Hu, Yue
    Yang, Weilong
    2018 3RD INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE), 2018, : 603 - 606
  • [39] VAQA: Visual Arabic Question Answering
    Sarah M. kamel
    Shimaa I. Hassan
    Lamiaa Elrefaei
    Arabian Journal for Science and Engineering, 2023, 48 : 10803 - 10823
  • [40] Scene Text Visual Question Answering
    Biten, Ali Furkan
    Tito, Ruben
    Mafla, Andres
    Gomez, Lluis
    Rusinol, Marcal
    Valveny, Ernest
    Jawahar, C. V.
    Karatzas, Dimosthenis
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4290 - 4300