VQA-BC: ROBUST VISUAL QUESTION ANSWERING VIA BIDIRECTIONAL CHAINING

被引：3

作者：

Lao, Mingrui ^{[1
]}

Guo, Yanming ^{[2
]}

Chen, Wei ^{[1
]}

Pu, Nan ^{[1
]}

Lew, Michael S. ^{[1
]}

机构：

[1] Leiden Univ, LIACS Medialab, Leiden, Netherlands

[2] Natl Univ Def Technol, Coll Syst Engn, Changsha, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Visual question answering; language bias; forward/backward chaining; label smoothing;

D O I：

10.1109/ICASSP43922.2022.9746493

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Current VQA models are suffering from the problem of over-dependence on language bias, which severely reduces their robustness in real-world scenarios. In this paper, we analyze VQA models from the view of forward/backward chaining in the inference engine, and propose to enhance their robustness via a novel Bidirectional Chaining (VQA-BC) framework. Specifically, we introduce a backward chaining with hard-negative contrastive learning to reason from the consequence (answers) to generate crucial known facts (question-related visual region features). Furthermore, to alleviate the over-confident problem in answer prediction (forward chaining), we present a novel introspective regularization to connect forward and backward chaining with label smoothing. Extensive experiments verify that VQA-BC not only effectively overcomes language bias on out-of-distribution dataset, but also alleviates the over-correct problem caused by ensemble-based method on in-distribution dataset. Compared with competitive debiasing strategies, our method achieves state-of-the-art performance to reduce language bias on VQA-CP v2 dataset.

引用

页码：4833 / 4837

页数：5

共 50 条

[41] Debiased Visual Question Answering via the perspective of question types
Huai, Tianyu
Yang, Shuwen
Zhang, Junhang
Zhao, Jiabao
He, Liang
PATTERN RECOGNITION LETTERS, 2024, 178 : 181 - 187
[42] From image to language: A critical analysis of Visual Question Answering (VQA) approaches, challenges, and opportunities
Ishmam, Md. Farhan
Shovon, Md. Sakib Hossain
Mridha, M. F.
Dey, Nilanjan
INFORMATION FUSION, 2024, 106
[43] Bidirectional cascaded multimodal attention for multiple choice visual question answering
Upadhyay, Sushmita
Tripathy, Sanjaya Shankar
MACHINE VISION AND APPLICATIONS, 2025, 36 (02)
[44] Feasibility of Visual Question Answering (VQA) for Post-Disaster Damage Detection Using Aerial Footage
Lowande, Rafael De Sa
Sevil, Hakki Erhan
APPLIED SCIENCES-BASEL, 2023, 13 (08):
[45] Self-Critical Reasoning for Robust Visual Question Answering
Wu, Jialin
Mooney, Raymond J.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[46] A CASCADED LONG SHORT-TERM MEMORY (LSTM) DRIVEN GENERIC VISUAL QUESTION ANSWERING (VQA)
Chowdhury, Iqbal
Kien Nguyen
Fookes, Clinton
Sridharan, Sridha
2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1842 - 1846
[47] SQT: Debiased Visual Question Answering via Shuffling Question Types
Huai, Tianyu
Yang, Shuwen
Zhang, Junhang
Wang, Guoan
Yu, Xinru
Ma, Tianlong
He, Liang
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 600 - 605
[48] Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Ma, Jie
Wang, Pinghui
Kong, Dechen
Wang, Zewei
Liu, Jun
Pei, Hongbin
Zhao, Junzhou
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5575 - 5594
[49] Modular Visual Question Answering via Code Generation
Subramanian, Sanjay
Narasimhan, Medhini
Khangaonkar, Kushal
Yang, Kevin
Nagrani, Arsha
Schmid, Cordelia
Zeng, Andy
Darrell, Trevor
Klein, Dan
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 747 - 761
[50] Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Chen, Long
Zheng, Yuhang
Niu, Yulei
Zhang, Hanwang
Xiao, Jun
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13218 - 13234

← 1 2 3 4 5 →