VQA-BC: ROBUST VISUAL QUESTION ANSWERING VIA BIDIRECTIONAL CHAINING

被引：3

作者：

Lao, Mingrui ^{[1
]}

Guo, Yanming ^{[2
]}

Chen, Wei ^{[1
]}

Pu, Nan ^{[1
]}

Lew, Michael S. ^{[1
]}

机构：

[1] Leiden Univ, LIACS Medialab, Leiden, Netherlands

[2] Natl Univ Def Technol, Coll Syst Engn, Changsha, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Visual question answering; language bias; forward/backward chaining; label smoothing;

D O I：

10.1109/ICASSP43922.2022.9746493

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Current VQA models are suffering from the problem of over-dependence on language bias, which severely reduces their robustness in real-world scenarios. In this paper, we analyze VQA models from the view of forward/backward chaining in the inference engine, and propose to enhance their robustness via a novel Bidirectional Chaining (VQA-BC) framework. Specifically, we introduce a backward chaining with hard-negative contrastive learning to reason from the consequence (answers) to generate crucial known facts (question-related visual region features). Furthermore, to alleviate the over-confident problem in answer prediction (forward chaining), we present a novel introspective regularization to connect forward and backward chaining with label smoothing. Extensive experiments verify that VQA-BC not only effectively overcomes language bias on out-of-distribution dataset, but also alleviates the over-correct problem caused by ensemble-based method on in-distribution dataset. Compared with competitive debiasing strategies, our method achieves state-of-the-art performance to reduce language bias on VQA-CP v2 dataset.

引用

页码：4833 / 4837

页数：5

共 50 条

[31] BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining
Kim, MinJun
Song, SeungWoo
Lee, YouHan
Jang, Haneol
Lim, KyungTae
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18381 - 18389
[32] Event-Oriented Visual Question Answering: The E-VQA Dataset and Benchmark
Yang, Zhenguo
Xiang, Jiale
You, Jiuxiang
Li, Qing
Liu, Wenyin
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) : 10210 - 10223
[33] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
Aishwarya Agrawal
Douglas Summers-Stay
Dhruv Batra
Devi Parikh
International Journal of Computer Vision, 2019, 127 : 398 - 414
[34] Cycle-Consistency for Robust Visual Question Answering
Shah, Meet
Chen, Xinlei
Rohrbach, Marcus
Parikh, Devi
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6642 - 6651
[35] On the role of question encoder sequence model in robust visual question answering
Kv, Gouthaman
Mittal, Anurag
PATTERN RECOGNITION, 2022, 131
[36] Rethinking Data Augmentation for Robust Visual Question Answering
Chen, Long
Zheng, Yuhang
Xiao, Jun
COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 95 - 112
[37] Fair Attention Network for Robust Visual Question Answering
Bi, Yandong
Jiang, Huajie
Hu, Yongli
Sun, Yanfeng
Yin, Baocai
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 7870 - 7881
[38] Greedy Gradient Ensemble for Robust Visual Question Answering
Han, Xinzhe
Wang, Shuhui
Su, Chi
Huang, Qingming
Tian, Qi
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1564 - 1573
[39] Overcoming Language Priors via Shuffling Language Bias for Robust Visual Question Answering
Zhao, J.
Yu, Z.
Zhang, X.
Yang, Y.
IEEE ACCESS, 2023, 11 : 85980 - 85989
[40] RESCUENET-VQA: A LARGE-SCALE VISUAL QUESTION ANSWERING BENCHMARK FOR DAMAGE ASSESSMENT
Sarkar, Argho
Rahnemoonfar, Maryam
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 1150 - 1153

← 1 2 3 4 5 →