RUBi: Reducing Unimodal Biases for Visual Question Answering

被引：0

作者：

Cadene, Remi ^{[1
]}

Dancette, Corentin ^{[1
]}

Ben-Younes, Hedi ^{[1
]}

Cord, Matthieu ^{[1
]}

Parikh, Devi ^{[2
,3
]}

机构：

[1] Sorbonne Univ, CNRS, LIP6, 4 Pl Jussieu, F-75005 Paris, France

[2] Facebook AI Res, Menlo Pk, CA 94025 USA

[3] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training. Our code is available: github.com/cdancette/rubi.bootstrap.pytorch

引用

页数：12

共 50 条

[1] Reducing Multi-model Biases for Robust Visual Question Answering
Zhang F.
Li Y.
Li X.
Xu J.
Chen Y.
Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60 (01): : 23 - 33
[2] RMLVQA: A Margin Loss Approach For Visual Question Answering with Language Biases
Basu, Abhipsa
Addepalli, Sravanti
Babu, R. Venkatesh
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11671 - 11680
[3] Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Dancette, Corentin
Cadene, Remi
Teney, Damien
Cord, Matthieu
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1554 - 1563
[4] Visual Question Answering
Nada, Ahmed
Chen, Min
2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
[5] Question Modifiers in Visual Question Answering
Britton, William
Sarkhel, Somdeb
Venugopal, Deepak
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
[6] Towards bias-aware visual question answering: Rectifying and mitigating comprehension biases
Chen, Chongqing
Han, Dezhi
Guo, Zihan
Chang, Chin-Chen
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 264
[7] VQA: Visual Question Answering
Antol, Stanislaw
Agrawal, Aishwarya
Lu, Jiasen
Mitchell, Margaret
Batra, Dhruv
Zitnick, C. Lawrence
Parikh, Devi
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
[8] Indic Visual Question Answering
Chandrasekar, Aditya
Shimpi, Amey
Naik, Dinesh
2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM, 2022,
[9] VQA: Visual Question Answering
Agrawal, Aishwarya
Lu, Jiasen
Antol, Stanislaw
Mitchell, Margaret
Zitnick, C. Lawrence
Parikh, Devi
Batra, Dhruv
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
[10] Survey on Visual Question Answering
Bao X.-G.
Zhou C.-L.
Xiao K.-J.
Qin B.
Ruan Jian Xue Bao/Journal of Software, 2021, 32 (08): : 2522 - 2544

← 1 2 3 4 5 →