Benchmarking Biomedical Relation Knowledge in Large Language Models

被引:0
|
作者
Zhang, Fenghui [1 ]
Yang, Kuo [1 ]
Zhao, Chenqian [1 ]
Li, Haixu [1 ]
Dong, Xin [1 ]
Tian, Haoyu [1 ]
Zhou, Xuezhong [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp Sci & Technol, Beijing Key Lab Traff Data Anal & Min, Inst Med Intelligence, Beijing 100044, Peoples R China
基金
中国国家自然科学基金;
关键词
biomedical knowledge evaluation; large language model; biomedical relationship identification; benchmarking;
D O I
10.1007/978-981-97-5131-0_41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a special knowledge base (KB), a large language model (LLM) stores a great deal of knowledge in the form of the parametric deep neural network, and evaluating the accuracy of the knowledge within this KB has emerged as a key area of interest in LLM research. Although lots of evaluation studies of LLM knowledge have been carried out, due to the complexity and scarcity of biomedical knowledge, there are still few evaluation studies on this kind of knowledge. To address this, we designed five specific identification and evaluation tasks for the biomedical knowledge in LLMs, including the identification of genes for diseases, targets for drugs/compounds, drugs for diseases, and effectiveness for herbs. We selected four well-known LLMs, including GPT-3.5turbo, GPT-4, ChatGLM-std, and LLaMA2-13B, to quantify the quality of biomedical knowledge in LLMs. Comprehensive experiments that include overall evaluation of accuracy and completeness, ablation analysis, few-shot prompt optimization and case study fully benchmarked the performance of LLMs in the identification of biomedical knowledge and assessed the quality of biomedical knowledge implicit in LLMs. Experimental results showed some interesting observations, e.g., the incompleteness and bias of knowledge of different LLMs, which will give us some insight into LLMs for biomedical discovery and application.
引用
收藏
页码:482 / 495
页数:14
相关论文
共 50 条
  • [41] Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models
    Tan, Qingyu
    Ng, Hwee Tou
    Bing, Lidong
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14820 - 14835
  • [42] MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering
    Alonso, Inigo
    Oronoz, Maite
    Agerri, Rodrigo
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 155
  • [43] Benchmarking Vision Capabilities of Large Language Models in Surgical Examination Questions
    Bereuter, Jean-Paul
    Geissler, Mark Enrik
    Klimova, Anna
    Steiner, Robert-Patrick
    Pfeiffer, Kevin
    Kolbinger, Fiona R.
    Wiest, Isabella C.
    Muti, Hannah Sophie
    Kather, Jakob Nikolas
    JOURNAL OF SURGICAL EDUCATION, 2025, 82 (04)
  • [44] Benchmarking Large Language Models for Automated Verilog RTL Code Generation
    Thakur, Shailja
    Ahmad, Baleegh
    Fan, Zhenxing
    Pearce, Hammond
    Tan, Benjamin
    Karri, Ramesh
    Dolan-Gavitt, Brendan
    Garg, Siddharth
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [45] Benchmarking Large Language Models on Controllable Generation under Diversified Instructions
    Chen, Yihan
    Xu, Benfeng
    Wang, Quan
    Liu, Yi
    Mao, Zhendong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17808 - 17816
  • [46] Benchmarking Causal Study to Interpret Large Language Models for Source Code
    Rodriguez-Cardenas, Daniel
    Palacio, David N.
    Khati, Dipin
    Burke, Henry
    Poshyvanyk, Denys
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 329 - 334
  • [47] Quo Vadis ChatGPT? From large language models to Large Knowledge Models
    Venkatasubramanian, Venkat
    Chakraborty, Arijit
    COMPUTERS & CHEMICAL ENGINEERING, 2025, 192
  • [48] KRAGEN: a knowledge graph-enhanced RAG framework for biomedical problem solving using large language models
    Matsumoto, Nicholas
    Moran, Jay
    Choi, Hyunjun
    Hernandez, Miguel E.
    Venkatesan, Mythreye
    Wang, Paul
    Moore, Jason H.
    BIOINFORMATICS, 2024, 40 (06)
  • [49] Ensemble pretrained language models to extract biomedical knowledge from literature
    Li, Zhao
    Wei, Qiang
    Huang, Liang-Chin
    Li, Jianfu
    Hu, Yan
    Chuang, Yao-Shun
    He, Jianping
    Das, Avisha
    Keloth, Vipina Kuttichi
    Yang, Yuntao
    Diala, Chiamaka S.
    Roberts, Kirk E.
    Tao, Cui
    Jiang, Xiaoqian
    Zheng, W. Jim
    Xu, Hua
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 1904 - 1911
  • [50] Continual knowledge infusion into pre-trained biomedical language models
    Jha, Kishlay
    Zhang, Aidong
    BIOINFORMATICS, 2022, 38 (02) : 494 - 502