MKGF: A multi-modal knowledge graph based RAG framework to enhance LVLMs for Medical visual question answering

被引:0
|
作者
Wu, Yinan [1 ]
Lu, Yuming [1 ]
Zhou, Yan [1 ]
Ding, Yifan [2 ]
Liu, Jingping [1 ]
Ruan, Tong [1 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
[2] Fudan Univ, Zhongshan Hosp, Dept Crit Care Med, Shanghai 200032, Peoples R China
关键词
Multi-modal; Knowledge graph; Large language model; RETRIEVAL;
D O I
10.1016/j.neucom.2025.129999
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical visual question answering (MedVQA) is a challenging task that requires models to understand medical images and return accurate responses for the given questions. Most recent methods focus on transferring general-domain large vision-language models (LVLMs) to the medical domain by constructing medical instruction datasets and in-context learning. However, the performance of these methods are limited due to the hallucination issue of LVLMs. In addition, fine-tuning the abundant parameters of LVLMs on medical instruction datasets is high time and economic cost. Hence, we propose a MKGF framework that leverages a multi-modal medical knowledge graph (MMKG) to relieve the hallucination issue without fine-tuning the abundant parameters of LVLMs. Firstly, we employ a pre-trained text retriever to build question-knowledge relations on training set. Secondly, we train a multi-modal retriever with these relations. Finally, we use it to retrieve question-relevant knowledge and enhance the performance of LVLMs on the test set. To evaluate the effectiveness of MKGF, we conduct extensive experiments on two public datasets Slake and VQA-RAD. Our method improves the pre-trained SOTA LVLMs by 10.15% and 9.32%, respectively. The source codes are available at https://github.com/ehnal/MKGF.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] A question answering system for assembly process of wind turbines based on multi-modal knowledge graph and large language model
    Hu, Zhiqiang
    Li, Xinyu
    Pan, Xinyu
    Wen, Sijie
    Bao, Jinsong
    JOURNAL OF ENGINEERING DESIGN, 2023,
  • [22] Multi-Modal Knowledge-Aware Attention Network for Question Answering
    Zhang Y.
    Qian S.
    Fang Q.
    Xu C.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2020, 57 (05): : 1037 - 1045
  • [23] MultiJAF: Multi-modal joint entity alignment framework for multi-modal knowledge graph
    Cheng, Bo
    Zhu, Jia
    Guo, Meimei
    NEUROCOMPUTING, 2022, 500 : 581 - 591
  • [24] Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering
    Salemi, Alireza
    Rafiee, Mahta
    Zamani, Hamed
    PROCEEDINGS OF THE 2023 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2023, 2023, : 169 - 176
  • [25] Multi-modal co-attention relation networks for visual question answering
    Zihan Guo
    Dezhi Han
    The Visual Computer, 2023, 39 : 5783 - 5795
  • [26] MERGE: A Modal Equilibrium Relational Graph Framework for Multi-Modal Knowledge Graph Completion
    Shang, Yuying
    Fu, Kun
    Zhang, Zequn
    Jin, Li
    Liu, Zinan
    Wang, Shensi
    Li, Shuchao
    SENSORS, 2024, 24 (23)
  • [27] A Multi-modal Debiasing Model with Dynamical Constraint for Robust Visual Question Answering
    Li, Yu
    Hu, Bojie
    Zhang, Fengshuo
    Yu, Yahan
    Liu, Jian
    Chen, Yufeng
    Xu, Jinan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5032 - 5045
  • [28] Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
    Xu, Yiming
    Chen, Lin
    Cheng, Zhongwei
    Duan, Lixin
    Luo, Jiebo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 367 - 376
  • [29] Multi-modal co-attention relation networks for visual question answering
    Guo, Zihan
    Han, Dezhi
    VISUAL COMPUTER, 2023, 39 (11): : 5783 - 5795
  • [30] Research on Medical Question Answering System Based on Knowledge Graph
    Jiang, Zhixue
    Chi, Chengying
    Zhan, Yunyun
    IEEE ACCESS, 2021, 9 : 21094 - 21101