RUBi: Reducing Unimodal Biases for Visual Question Answering

被引：0

作者：

Cadene, Remi ^{[1
]}

Dancette, Corentin ^{[1
]}

Ben-Younes, Hedi ^{[1
]}

Cord, Matthieu ^{[1
]}

Parikh, Devi ^{[2
,3
]}

机构：

[1] Sorbonne Univ, CNRS, LIP6, 4 Pl Jussieu, F-75005 Paris, France

[2] Facebook AI Res, Menlo Pk, CA 94025 USA

[3] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training. Our code is available: github.com/cdancette/rubi.bootstrap.pytorch

引用

页数：12

共 50 条

[21] Question action relevance and editing for visual question answering
Toor, Andeep S.
Wechsler, Harry
Nappi, Michele
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 2921 - 2935
[22] Multi-Question Learning for Visual Question Answering
Lei, Chenyi
Wu, Lei
Liu, Dong
Li, Zhao
Wang, Guoxin
Tang, Haihong
Li, Houqiang
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11328 - 11335
[23] Question Type Guided Attention in Visual Question Answering
Shi, Yang
Furlanello, Tommaso
Zha, Sheng
Anandkumar, Animashree
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
[24] Differential Attention for Visual Question Answering
Patro, Badri
Namboodiri, Vinay P.
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688
[25] Structured Attentions for Visual Question Answering
Zhu, Chen
Zhao, Yanpeng
Huang, Shuaiyi
Tu, Kewei
Ma, Yi
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1300 - 1309
[26] An Analysis of Visual Question Answering Algorithms
Kafle, Kushal
Kanan, Christopher
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1983 - 1991
[27] Multimodal Attention for Visual Question Answering
Kodra, Lorena
Mece, Elinda Kajo
INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
[28] Affective Visual Question Answering Network
Ruwa, Nelson
Mao, Qirong
Wang, Liangjun
Dong, Ming
IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 170 - 173
[29] Visual Question Answering with Question Representation Update (QRU)
Li, Ruiyu
Jia, Jiaya
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[30] Visual Question Answering on 360° Images
Chou, Shih-Han
Chao, Wei-Lun
Lai, Wei-Sheng
Sun, Min
Yang, Ming-Hsuan
2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1596 - 1605

← 1 2 3 4 5 →