VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge

被引:23
|
作者
Ravi, Sahithya [1 ,2 ]
Chinchure, Aditya [1 ,2 ]
Sigal, Leonid [1 ,2 ]
Liao, Renjie [1 ]
Shwartz, Vered [1 ,2 ]
机构
[1] Univ British Columbia, Vancouver, BC, Canada
[2] Vector Inst AI, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1109/WACV56688.2023.00121
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There has been a growing interest in solving Visual Question Answering (VQA) tasks that require the model to reason beyond the content present in the image. In this work, we focus on questions that require commonsense reasoning. In contrast to previous methods which inject knowledge from static knowledge bases, we investigate the incorporation of contextualized knowledge using Commonsense Transformer (COMET), an existing knowledge model trained on human-curated knowledge bases. We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues in a new pre-trained Vision-Language-Commonsense transformer model, VLC-BERT. Through our evaluation on the knowledge-intensive OK-VQA and A-OKVQA datasets, we show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases. Furthermore, through a detailed analysis, we explain which questions benefit, and which don't, from contextualized commonsense knowledge from COMET. Code: https://github.com/aditya10/VLC-BERT
引用
收藏
页码:1155 / 1165
页数:11
相关论文
共 50 条
  • [1] Fusing Context Into Knowledge Graph for Commonsense Question Answering
    Xu, Yichong
    Zhu, Chenguang
    Xu, Ruochen
    Liu, Yang
    Zeng, Michael
    Huang, Xuedong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1201 - 1207
  • [2] COMMONSENSEQA: A Question Answering Challenge Targeting Commonsense Knowledge
    Talmor, Alon
    Herzig, Jonathan
    Lourie, Nicholas
    Berant, Jonathan
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4149 - 4158
  • [3] Contextualized question answering
    Bradeško L.
    Dali L.
    Fortuna B.
    Grobelnik M.
    Mladenić D.
    Novalija I.
    Pajntar B.
    Journal of Computing and Information Technology, 2010, 18 (04) : 325 - 332
  • [4] CKF: Conditional Knowledge Fusion Method for CommonSense Question Answering
    Xie, Minghui
    Hao, Chuzhan
    Zhang, Peng
    Ma, XinDian
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, NLPCC 2024, 2025, 15359 : 110 - 122
  • [5] KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for visual commonsense reasoning®
    Song, Dandan
    Ma, Siyi
    Sun, Zhanchen
    Yang, Sicheng
    Liao, Lejian
    KNOWLEDGE-BASED SYSTEMS, 2021, 230
  • [6] AnswerArt - Contextualized Question Answering
    Dali, Lorand
    Rusu, Delia
    Fortuna, Blaz
    Mladenic, Dunja
    Grobelnik, Marko
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2010, 6323 : 579 - 582
  • [7] Survey on Commonsense Question Answering
    Fan Y.-F.
    Zou B.-W.
    Xu Q.-T.
    Li Z.-F.
    Hong Y.
    Hong, Yu (tianxianer@gmail.com), 1600, Chinese Academy of Sciences (35): : 236 - 265
  • [8] Stacked Attention based Textbook Visual Question Answering with BERT
    Aishwarya, R.
    Sarath, P.
    Rahman, Shibil P.
    Sneha, U.
    Manmadhan, Sruthy
    2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
  • [9] Visual Explanation for Open-Domain Question Answering With BERT
    Shao, Zekai
    Sun, Shuran
    Zhao, Yuheng
    Wang, Siyuan
    Wei, Zhongyu
    Gui, Tao
    Turkay, Cagatay
    Chen, Siming
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (07) : 3779 - 3797
  • [10] A Joint-BERT Method for Knowledge Base Question Answering
    Zhang, Tianyu
    Chen, Zhiyun
    2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 35 - 40