VQA-BC: ROBUST VISUAL QUESTION ANSWERING VIA BIDIRECTIONAL CHAINING

被引：3

作者：

Lao, Mingrui ^{[1
]}

Guo, Yanming ^{[2
]}

Chen, Wei ^{[1
]}

Pu, Nan ^{[1
]}

Lew, Michael S. ^{[1
]}

机构：

[1] Leiden Univ, LIACS Medialab, Leiden, Netherlands

[2] Natl Univ Def Technol, Coll Syst Engn, Changsha, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Visual question answering; language bias; forward/backward chaining; label smoothing;

D O I：

10.1109/ICASSP43922.2022.9746493

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Current VQA models are suffering from the problem of over-dependence on language bias, which severely reduces their robustness in real-world scenarios. In this paper, we analyze VQA models from the view of forward/backward chaining in the inference engine, and propose to enhance their robustness via a novel Bidirectional Chaining (VQA-BC) framework. Specifically, we introduce a backward chaining with hard-negative contrastive learning to reason from the consequence (answers) to generate crucial known facts (question-related visual region features). Furthermore, to alleviate the over-confident problem in answer prediction (forward chaining), we present a novel introspective regularization to connect forward and backward chaining with label smoothing. Extensive experiments verify that VQA-BC not only effectively overcomes language bias on out-of-distribution dataset, but also alleviates the over-correct problem caused by ensemble-based method on in-distribution dataset. Compared with competitive debiasing strategies, our method achieves state-of-the-art performance to reduce language bias on VQA-CP v2 dataset.

引用

页码：4833 / 4837

页数：5

共 50 条

[21] OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Marino, Kenneth
Rastegari, Mohammad
Farhadi, Ali
Mottaghi, Roozbeh
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3190 - 3199
[22] Bidirectional Contrastive Split Learning for Visual Question Answering
Sun, Yuwei
Ochiai, Hideya
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21602 - 21609
[23] Adversarial Learning with Bidirectional Attention for Visual Question Answering
Li, Qifeng
Tang, Xinyi
Jian, Yi
SENSORS, 2021, 21 (21)
[24] R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering
Lu, Pan
Ji, Lei
Zhang, Wei
Duan, Nan
Zhou, Ming
Wang, Jianyong
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1880 - 1889
[25] Cross Modality Bias in Visual Question Answering: A Causal View With Possible Worlds VQA
Vosoughi, Ali
Deng, Shijian
Zhang, Songyang
Tian, Yapeng
Xu, Chenliang
Luo, Jiebo
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8609 - 8624
[26] Generative Bias for Robust Visual Question Answering
Cho, Jae Won
Kim, Dong-Jin
Ryu, Hyeonggon
Kweon, In So
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11681 - 11690
[27] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Goyal, Yash
Khot, Tejas
Summers-Stay, Douglas
Batra, Dhruv
Parikh, Devi
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6325 - 6334
[28] WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
Chen, Pingyi
Zhu, Chenglu
Zheng, Sunyi
Li, Honglin
Yang, Lin
COMPUTER VISION - ECCV 2024, PT XXXVI, 2025, 15094 : 401 - 417
[29] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Goyal, Yash
Khot, Tejas
Agrawal, Aishwarya
Summers-Stay, Douglas
Batra, Dhruv
Parikh, Devi
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (04) : 398 - 414
[30] Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering
Naik, Nandita
Potts, Christopher
Kreiss, Elisa
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2813 - 2817

← 1 2 3 4 5 →