VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge

被引:23
|
作者
Ravi, Sahithya [1 ,2 ]
Chinchure, Aditya [1 ,2 ]
Sigal, Leonid [1 ,2 ]
Liao, Renjie [1 ]
Shwartz, Vered [1 ,2 ]
机构
[1] Univ British Columbia, Vancouver, BC, Canada
[2] Vector Inst AI, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1109/WACV56688.2023.00121
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There has been a growing interest in solving Visual Question Answering (VQA) tasks that require the model to reason beyond the content present in the image. In this work, we focus on questions that require commonsense reasoning. In contrast to previous methods which inject knowledge from static knowledge bases, we investigate the incorporation of contextualized knowledge using Commonsense Transformer (COMET), an existing knowledge model trained on human-curated knowledge bases. We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues in a new pre-trained Vision-Language-Commonsense transformer model, VLC-BERT. Through our evaluation on the knowledge-intensive OK-VQA and A-OKVQA datasets, we show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases. Furthermore, through a detailed analysis, we explain which questions benefit, and which don't, from contextualized commonsense knowledge from COMET. Code: https://github.com/aditya10/VLC-BERT
引用
收藏
页码:1155 / 1165
页数:11
相关论文
共 50 条
  • [11] MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
    Khan, Aisha Urooj
    Mazaheri, Amir
    Lobo, Niels Da Vitoria
    Shah, Mubarak
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4648 - 4660
  • [12] External Commonsense Knowledge as a Modality for Social Intelligence Question-Answering
    Natu, Sanika
    Sural, Shounak
    Sarkar, Sulagna
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3036 - 3042
  • [13] Knowledge-aware adaptive graph network for commonsense question answering
    Kang, Long
    Li, Xiaoge
    An, Xiaochun
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (05) : 1305 - 1324
  • [14] Meta-path reasoning of knowledge graph for commonsense question answering
    Zhang, Miao
    He, Tingting
    Dong, Ming
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (01)
  • [15] KEPR: Knowledge Enhancement and Plausibility Ranking for Generative Commonsense Question Answering
    Li, Zhifeng
    Zou, Bowei
    Fan, Yifan
    Hong, Yu
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [16] Retrieval-Augmented Knowledge Graph Reasoning for Commonsense Question Answering
    Sha, Yuchen
    Feng, Yujian
    He, Miao
    Liu, Shangdong
    Ji, Yimu
    MATHEMATICS, 2023, 11 (15)
  • [17] KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for visual commonsense reasoning[Formula presented]
    Song, Dandan
    Ma, Siyi
    Sun, Zhanchen
    Yang, Sicheng
    Liao, Lejian
    Knowledge-Based Systems, 2021, 230
  • [18] Incorporating Domain Knowledge and Semantic Information into Language Models for Commonsense Question Answering
    Zhou, Ruiying
    Tian, Keke
    Lai, Hanjiang
    Yin, Jian
    PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 1160 - 1165
  • [19] Contextualized Attention-based Knowledge Transfer for Spoken Conversational Question Answering
    You, Chenyu
    Chen, Nuo
    Zou, Yuexian
    INTERSPEECH 2021, 2021, : 3211 - 3215
  • [20] ALBERT with Knowledge Graph Encoder Utilizing Semantic Similarity for Commonsense Question Answering
    Choi, Byeongmin
    Lee, YongHyun
    Kyung, Yeunwoong
    Kim, Eunchan
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (01): : 71 - 82