VQA-BC: ROBUST VISUAL QUESTION ANSWERING VIA BIDIRECTIONAL CHAINING

被引:3
|
作者
Lao, Mingrui [1 ]
Guo, Yanming [2 ]
Chen, Wei [1 ]
Pu, Nan [1 ]
Lew, Michael S. [1 ]
机构
[1] Leiden Univ, LIACS Medialab, Leiden, Netherlands
[2] Natl Univ Def Technol, Coll Syst Engn, Changsha, Peoples R China
关键词
Visual question answering; language bias; forward/backward chaining; label smoothing;
D O I
10.1109/ICASSP43922.2022.9746493
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Current VQA models are suffering from the problem of over-dependence on language bias, which severely reduces their robustness in real-world scenarios. In this paper, we analyze VQA models from the view of forward/backward chaining in the inference engine, and propose to enhance their robustness via a novel Bidirectional Chaining (VQA-BC) framework. Specifically, we introduce a backward chaining with hard-negative contrastive learning to reason from the consequence (answers) to generate crucial known facts (question-related visual region features). Furthermore, to alleviate the over-confident problem in answer prediction (forward chaining), we present a novel introspective regularization to connect forward and backward chaining with label smoothing. Extensive experiments verify that VQA-BC not only effectively overcomes language bias on out-of-distribution dataset, but also alleviates the over-correct problem caused by ensemble-based method on in-distribution dataset. Compared with competitive debiasing strategies, our method achieves state-of-the-art performance to reduce language bias on VQA-CP v2 dataset.
引用
收藏
页码:4833 / 4837
页数:5
相关论文
共 50 条
  • [1] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [2] VQA: Visual Question Answering
    Agrawal, Aishwarya
    Lu, Jiasen
    Antol, Stanislaw
    Mitchell, Margaret
    Zitnick, C. Lawrence
    Parikh, Devi
    Batra, Dhruv
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
  • [3] R-VQA: A robust visual question answering model
    Chowdhury, Souvik
    Soni, Badal
    KNOWLEDGE-BASED SYSTEMS, 2025, 309
  • [4] VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task
    Bi, Yandong
    Jiang, Huajie
    Liu, Jing
    Liu, Mengting
    Hu, Yongli
    Yin, Baocai
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 264 - 277
  • [5] VC-VQA: VISUAL CALIBRATION MECHANISM FOR VISUAL QUESTION ANSWERING
    Qiao, Yanyuan
    Yu, Zheng
    Liu, Jing
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1481 - 1485
  • [6] Cycle-VQA: A Cycle-Consistent Framework for Robust Medical Visual Question Answering
    Fan, Lin
    Gong, Xun
    Zheng, Cenyang
    Tan, Xuli
    Li, Jiao
    Ou, Yafei
    PATTERN RECOGNITION, 2025, 165
  • [7] CQ-VQA: Visual Question Answering on Categorized Questions
    Mishra, Aakansha
    Anand, Ashish
    Guha, Prithwijit
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [8] Robust visual question answering via polarity enhancement and contrast *
    Peng, Dahe
    Li, Zhixin
    NEURAL NETWORKS, 2024, 179
  • [9] CS-VQA: VISUAL QUESTION ANSWERING WITH COMPRESSIVELY SENSED IMAGES
    Huang, Li-Chi
    Kulkarni, Kuldeep
    Jha, Anik
    Lohit, Suhas
    Jayasuriya, Suren
    Turaga, Pavan
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 1283 - 1287
  • [10] Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool
    Liu, Feng
    Xiang, Tao
    Hospedales, Timothy M.
    Yang, Wankou
    Sun, Changyin
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 460 - 474