VQA-BC: ROBUST VISUAL QUESTION ANSWERING VIA BIDIRECTIONAL CHAINING

被引:3
|
作者
Lao, Mingrui [1 ]
Guo, Yanming [2 ]
Chen, Wei [1 ]
Pu, Nan [1 ]
Lew, Michael S. [1 ]
机构
[1] Leiden Univ, LIACS Medialab, Leiden, Netherlands
[2] Natl Univ Def Technol, Coll Syst Engn, Changsha, Peoples R China
关键词
Visual question answering; language bias; forward/backward chaining; label smoothing;
D O I
10.1109/ICASSP43922.2022.9746493
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Current VQA models are suffering from the problem of over-dependence on language bias, which severely reduces their robustness in real-world scenarios. In this paper, we analyze VQA models from the view of forward/backward chaining in the inference engine, and propose to enhance their robustness via a novel Bidirectional Chaining (VQA-BC) framework. Specifically, we introduce a backward chaining with hard-negative contrastive learning to reason from the consequence (answers) to generate crucial known facts (question-related visual region features). Furthermore, to alleviate the over-confident problem in answer prediction (forward chaining), we present a novel introspective regularization to connect forward and backward chaining with label smoothing. Extensive experiments verify that VQA-BC not only effectively overcomes language bias on out-of-distribution dataset, but also alleviates the over-correct problem caused by ensemble-based method on in-distribution dataset. Compared with competitive debiasing strategies, our method achieves state-of-the-art performance to reduce language bias on VQA-CP v2 dataset.
引用
收藏
页码:4833 / 4837
页数:5
相关论文
共 50 条
  • [41] Debiased Visual Question Answering via the perspective of question types
    Huai, Tianyu
    Yang, Shuwen
    Zhang, Junhang
    Zhao, Jiabao
    He, Liang
    PATTERN RECOGNITION LETTERS, 2024, 178 : 181 - 187
  • [42] From image to language: A critical analysis of Visual Question Answering (VQA) approaches, challenges, and opportunities
    Ishmam, Md. Farhan
    Shovon, Md. Sakib Hossain
    Mridha, M. F.
    Dey, Nilanjan
    INFORMATION FUSION, 2024, 106
  • [43] Bidirectional cascaded multimodal attention for multiple choice visual question answering
    Upadhyay, Sushmita
    Tripathy, Sanjaya Shankar
    MACHINE VISION AND APPLICATIONS, 2025, 36 (02)
  • [44] Feasibility of Visual Question Answering (VQA) for Post-Disaster Damage Detection Using Aerial Footage
    Lowande, Rafael De Sa
    Sevil, Hakki Erhan
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [45] Self-Critical Reasoning for Robust Visual Question Answering
    Wu, Jialin
    Mooney, Raymond J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [46] A CASCADED LONG SHORT-TERM MEMORY (LSTM) DRIVEN GENERIC VISUAL QUESTION ANSWERING (VQA)
    Chowdhury, Iqbal
    Kien Nguyen
    Fookes, Clinton
    Sridharan, Sridha
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1842 - 1846
  • [47] SQT: Debiased Visual Question Answering via Shuffling Question Types
    Huai, Tianyu
    Yang, Shuwen
    Zhang, Junhang
    Wang, Guoan
    Yu, Xinru
    Ma, Tianlong
    He, Liang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 600 - 605
  • [48] Robust Visual Question Answering: Datasets, Methods, and Future Challenges
    Ma, Jie
    Wang, Pinghui
    Kong, Dechen
    Wang, Zewei
    Liu, Jun
    Pei, Hongbin
    Zhao, Junzhou
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5575 - 5594
  • [49] Modular Visual Question Answering via Code Generation
    Subramanian, Sanjay
    Narasimhan, Medhini
    Khangaonkar, Kushal
    Yang, Kevin
    Nagrani, Arsha
    Schmid, Cordelia
    Zeng, Andy
    Darrell, Trevor
    Klein, Dan
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 747 - 761
  • [50] Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
    Chen, Long
    Zheng, Yuhang
    Niu, Yulei
    Zhang, Hanwang
    Xiao, Jun
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13218 - 13234