MKGF: A multi-modal knowledge graph based RAG framework to enhance LVLMs for Medical visual question answering

被引：0

作者：

Wu, Yinan ^{[1
]}

Lu, Yuming ^{[1
]}

Zhou, Yan ^{[1
]}

Ding, Yifan ^{[2
]}

Liu, Jingping ^{[1
]}

Ruan, Tong ^{[1
]}

机构：

[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China

[2] Fudan Univ, Zhongshan Hosp, Dept Crit Care Med, Shanghai 200032, Peoples R China

来源：

NEUROCOMPUTING | 2025年 / 635卷

关键词：

Multi-modal; Knowledge graph; Large language model; RETRIEVAL;

D O I：

10.1016/j.neucom.2025.129999

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Medical visual question answering (MedVQA) is a challenging task that requires models to understand medical images and return accurate responses for the given questions. Most recent methods focus on transferring general-domain large vision-language models (LVLMs) to the medical domain by constructing medical instruction datasets and in-context learning. However, the performance of these methods are limited due to the hallucination issue of LVLMs. In addition, fine-tuning the abundant parameters of LVLMs on medical instruction datasets is high time and economic cost. Hence, we propose a MKGF framework that leverages a multi-modal medical knowledge graph (MMKG) to relieve the hallucination issue without fine-tuning the abundant parameters of LVLMs. Firstly, we employ a pre-trained text retriever to build question-knowledge relations on training set. Secondly, we train a multi-modal retriever with these relations. Finally, we use it to retrieve question-relevant knowledge and enhance the performance of LVLMs on the test set. To evaluate the effectiveness of MKGF, we conduct extensive experiments on two public datasets Slake and VQA-RAD. Our method improves the pre-trained SOTA LVLMs by 10.15% and 9.32%, respectively. The source codes are available at https://github.com/ehnal/MKGF.

引用

页数：10

共 50 条

[21] A question answering system for assembly process of wind turbines based on multi-modal knowledge graph and large language model
Hu, Zhiqiang
Li, Xinyu
Pan, Xinyu
Wen, Sijie
Bao, Jinsong
JOURNAL OF ENGINEERING DESIGN, 2023,
[22] Multi-Modal Knowledge-Aware Attention Network for Question Answering
Zhang Y.
Qian S.
Fang Q.
Xu C.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2020, 57 (05): : 1037 - 1045
[23] MultiJAF: Multi-modal joint entity alignment framework for multi-modal knowledge graph
Cheng, Bo
Zhu, Jia
Guo, Meimei
NEUROCOMPUTING, 2022, 500 : 581 - 591
[24] Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering
Salemi, Alireza
Rafiee, Mahta
Zamani, Hamed
PROCEEDINGS OF THE 2023 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2023, 2023, : 169 - 176
[25] Multi-modal co-attention relation networks for visual question answering
Zihan Guo
Dezhi Han
The Visual Computer, 2023, 39 : 5783 - 5795
[26] MERGE: A Modal Equilibrium Relational Graph Framework for Multi-Modal Knowledge Graph Completion
Shang, Yuying
Fu, Kun
Zhang, Zequn
Jin, Li
Liu, Zinan
Wang, Shensi
Li, Shuchao
SENSORS, 2024, 24 (23)
[27] A Multi-modal Debiasing Model with Dynamical Constraint for Robust Visual Question Answering
Li, Yu
Hu, Bojie
Zhang, Fengshuo
Yu, Yahan
Liu, Jian
Chen, Yufeng
Xu, Jinan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5032 - 5045
[28] Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
Xu, Yiming
Chen, Lin
Cheng, Zhongwei
Duan, Lixin
Luo, Jiebo
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 367 - 376
[29] Multi-modal co-attention relation networks for visual question answering
Guo, Zihan
Han, Dezhi
VISUAL COMPUTER, 2023, 39 (11): : 5783 - 5795
[30] Research on Medical Question Answering System Based on Knowledge Graph
Jiang, Zhixue
Chi, Chengying
Zhan, Yunyun
IEEE ACCESS, 2021, 9 : 21094 - 21101

← 1 2 3 4 5 →