Prompting large language model with context and pre-answer for knowledge-based VQA

被引:7
|
作者
Hu, Zhongjian [1 ,2 ]
Yang, Peng [1 ,2 ]
Jiang, Yuanshuang [2 ]
Bai, Zijian [1 ,2 ]
机构
[1] Southeast Univ, Key Lab Comp Network & Informat Integrat, Minist Educ, Nanjing, Peoples R China
[2] Southeast Univ, Sch Comp Sci & Engn, Nanjing, Peoples R China
基金
中国国家自然科学基金; 中国国家社会科学基金;
关键词
Visual question answering; Large language model; Knowledge-based VQA; Fine-tuning; In-context learning;
D O I
10.1016/j.patcog.2024.110399
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing studies apply Large Language Model (LLM) to knowledge -based Visual Question Answering (VQA) with encouraging results. Due to the insufficient input information, the previous methods still have shortcomings in constructing the prompt for LLM, and cannot fully activate the capacity of LLM. In addition, previous works adopt GPT-3 for inference, which has expensive costs. In this paper, we propose PCPA: a framework that Prompts LLM with Context and Pre -Answer for VQA. Specifically, we adopt a vanilla VQA model to generate in -context examples and candidate answers, and add a pre -answer selection layer to generate preanswers. We integrate in -context examples and pre -answers into the prompt to inspire the LLM. In addition, we choose LLaMA instead of GPT-3, which is an open and free model. We build a small dataset to fine-tune the LLM. Compared to existing baselines, the PCPA improves accuracy by more than 2.1 and 1.5 on OK-VQA and A-OKVQA, respectively.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering
    Shao, Zhenwei
    Yu, Zhou
    Wang, Meng
    Yu, Jun
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14974 - 14983
  • [2] Breaking the Barrier Between Pre-training and Fine-tuning: A Hybrid Prompting Model for Knowledge-Based VQA
    Sun, Zhongfan
    Hu, Yongli
    Gao, Qingqing
    Jiang, Huajie
    Gao, Junbin
    Sun, Yanfeng
    Yin, Baocai
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4065 - 4073
  • [3] Multi-Modal Answer Validation for Knowledge-Based VQA
    Wu, Jialin
    Lu, Jiasen
    Sabharwal, Ashish
    Mottaghi, Roozbeh
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2712 - 2721
  • [4] Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
    Hu, Zhongjian
    Yang, Peng
    Liu, Fengyuan
    Meng, Yuan
    Liu, Xingyu
    BIG DATA MINING AND ANALYTICS, 2024, 7 (03): : 843 - 857
  • [5] Coordinating explicit and implicit knowledge for knowledge-based VQA
    Wang, Qunbo
    Liu, Jing
    Wu, Wenjun
    PATTERN RECOGNITION, 2024, 151
  • [6] Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions
    Wang, Ziyue
    Chen, Chi
    Li, Peng
    Liu, Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2874 - 2890
  • [7] Context-faithful Prompting for Large Language Models
    Zhou, Wenxuan
    Zhang, Sheng
    Poon, Hoifung
    Chen, Muhao
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14544 - 14556
  • [8] KnowIT VQA: Answering Knowledge-Based Questions about Videos
    Garcia, Noa
    Otani, Mayu
    Chu, Chenhui
    Nakashima, Yuta
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10826 - 10834
  • [9] Prompting disentangled embeddings for knowledge graph completion with pre-trained language model
    Geng, Yuxia
    Chen, Jiaoyan
    Zeng, Yuhang
    Chen, Zhuo
    Zhang, Wen
    Pan, Jeff Z.
    Wang, Yuxiang
    Xu, Xiaoliang
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 268
  • [10] Pre-service Language Teachers' Task-specific Large Language Model Prompting Practices
    Moorhouse, Benjamin Luke
    Ho, Tsz Ying
    Wu, Chenze
    Wan, Yuwei
    RELC JOURNAL, 2025,