Towards bias-aware visual question answering: Rectifying and mitigating comprehension biases

被引:0
|
作者
Chen, Chongqing [1 ]
Han, Dezhi [1 ]
Guo, Zihan [2 ]
Chang, Chin-Chen [3 ]
机构
[1] Shanghai Maritime Univ, Sch Informat Engn, Shanghai 201306, Peoples R China
[2] Changzhi Univ, Dept Comp Sci, Changzhi 046011, Shanxi, Peoples R China
[3] Feng Chia Univ, Dept Informat Engn & Comp Sci, Taichung 407, Taiwan
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Comprehension biases; Relational dependency modeling; Visual question answering (VQA); Inference capability; Contextual information; SELF-ATTENTION NETWORKS; LANGUAGE;
D O I
10.1016/j.eswa.2024.125817
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformers have become essential for capturing intra- and inter-dependencies in visual question answering (VQA). Yet, challenges remain in overcoming inherent comprehension biases and improving the relational dependency modeling and reasoning capabilities crucial for VQA tasks. This paper presents RMCB, a novel VQA model designed to mitigate these biases by integrating contextual information from both visual and linguistic sources and addressing potential comprehension limitations at each end. RMCB introduces enhanced relational modeling for language tokens by leveraging textual context, addressing comprehension biases arising from the isolated pairwise modeling of token relationships. For the visual component, RMCB systematically incorporates both absolute and relative spatial relational information as contextual cues for image tokens, refining dependency modeling and strengthening inferential reasoning to alleviate biases caused by limited contextual understanding. The model's effectiveness was evaluated on benchmark datasets VQA-v2 and CLEVR, achieving state-of-the-art results with accuracies of 71.78% and 99.27%, respectively. These results underscore RMCB's capability to effectively address comprehension biases while advancing the relational reasoning needed for VQA.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering
    Lu, Qiwen
    Chen, Shengbo
    Zhu, Xiaoke
    JOURNAL OF IMAGING, 2024, 10 (03)
  • [2] Visual Question Answering as Reading Comprehension
    Li, Hui
    Wang, Peng
    Shen, Chunhua
    van den Hengel, Anton
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6312 - 6321
  • [3] RUBi: Reducing Unimodal Biases for Visual Question Answering
    Cadene, Remi
    Dancette, Corentin
    Ben-Younes, Hedi
    Cord, Matthieu
    Parikh, Devi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] CHANGE-AWARE VISUAL QUESTION ANSWERING
    Yuan, Zhenghang
    Mou, Lichao
    Zhu, Xiao Xiang
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 227 - 230
  • [5] Mood-aware visual question answering
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Gou, Jianping
    Dong, Ming
    NEUROCOMPUTING, 2019, 330 : 305 - 316
  • [6] Generative Bias for Robust Visual Question Answering
    Cho, Jae Won
    Kim, Dong-Jin
    Ryu, Hyeonggon
    Kweon, In So
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11681 - 11690
  • [7] Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering
    Naik, Nandita
    Potts, Christopher
    Kreiss, Elisa
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2813 - 2817
  • [8] KVQA: Knowledge-Aware Visual Question Answering
    Shah, Sanket
    Mishra, Anand
    Yadati, Naganand
    Talukdar, Partha Pratim
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8876 - 8884
  • [9] Towards Visual Question Answering on Pathology Images
    He, Xuehai
    Cai, Zhuo
    Wei, Wenlan
    Zhang, Yichen
    Mou, Luntian
    Xing, Eric
    Xie, Pengtao
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 708 - 718
  • [10] TYPE-AWARE MEDICAL VISUAL QUESTION ANSWERING
    Zhang, Anda
    Tao, Wei
    Li, Ziyan
    Wang, Haofen
    Zhang, Wenqiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4838 - 4842