RUBi: Reducing Unimodal Biases for Visual Question Answering

被引：0

作者：

Cadene, Remi ^{[1
]}

Dancette, Corentin ^{[1
]}

Ben-Younes, Hedi ^{[1
]}

Cord, Matthieu ^{[1
]}

Parikh, Devi ^{[2
,3
]}

机构：

[1] Sorbonne Univ, CNRS, LIP6, 4 Pl Jussieu, F-75005 Paris, France

[2] Facebook AI Res, Menlo Pk, CA 94025 USA

[3] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training. Our code is available: github.com/cdancette/rubi.bootstrap.pytorch

引用

页数：12

共 50 条

[31] Medical visual question answering: A survey
Lin, Zhihong
Zhang, Donghao
Tao, Qingyi
Shi, Danli
Haffari, Gholamreza
Wu, Qi
He, Mingguang
Ge, Zongyuan
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 143
[32] Chain of Reasoning for Visual Question Answering
Wu, Chenfei
Liu, Jinlai
Wang, Xiaojie
Dong, Xuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[33] Visual Question Answering as Reading Comprehension
Li, Hui
Wang, Peng
Shen, Chunhua
van den Hengel, Anton
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6312 - 6321
[34] Revisiting Visual Question Answering Baselines
Jabri, Allan
Joulin, Armand
van der Maaten, Laurens
COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 727 - 739
[35] Answer Distillation for Visual Question Answering
Fang, Zhiwei
Liu, Jing
Tang, Qu
Li, Yong
Lu, Hanqing
COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 72 - 87
[36] iVQA: Inverse Visual Question Answering
Liu, Feng
Xiang, Tao
Hospedales, Timothy M.
Yang, Wankou
Sun, Changyin
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8611 - 8619
[37] VAQA: Visual Arabic Question Answering
Kamel, Sarah M. M.
Hassan, Shimaa I. I.
Elrefaei, Lamiaa
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10803 - 10823
[38] Adapted GooLeNet for Visual Question Answering
Huang, Jie
Hu, Yue
Yang, Weilong
2018 3RD INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE), 2018, : 603 - 606
[39] VAQA: Visual Arabic Question Answering
Sarah M. kamel
Shimaa I. Hassan
Lamiaa Elrefaei
Arabian Journal for Science and Engineering, 2023, 48 : 10803 - 10823
[40] Scene Text Visual Question Answering
Biten, Ali Furkan
Tito, Ruben
Mafla, Andres
Gomez, Lluis
Rusinol, Marcal
Valveny, Ernest
Jawahar, C. V.
Karatzas, Dimosthenis
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4290 - 4300

← 1 2 3 4 5 →