Benchmarking Biomedical Relation Knowledge in Large Language Models

被引:0
|
作者
Zhang, Fenghui [1 ]
Yang, Kuo [1 ]
Zhao, Chenqian [1 ]
Li, Haixu [1 ]
Dong, Xin [1 ]
Tian, Haoyu [1 ]
Zhou, Xuezhong [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp Sci & Technol, Beijing Key Lab Traff Data Anal & Min, Inst Med Intelligence, Beijing 100044, Peoples R China
基金
中国国家自然科学基金;
关键词
biomedical knowledge evaluation; large language model; biomedical relationship identification; benchmarking;
D O I
10.1007/978-981-97-5131-0_41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a special knowledge base (KB), a large language model (LLM) stores a great deal of knowledge in the form of the parametric deep neural network, and evaluating the accuracy of the knowledge within this KB has emerged as a key area of interest in LLM research. Although lots of evaluation studies of LLM knowledge have been carried out, due to the complexity and scarcity of biomedical knowledge, there are still few evaluation studies on this kind of knowledge. To address this, we designed five specific identification and evaluation tasks for the biomedical knowledge in LLMs, including the identification of genes for diseases, targets for drugs/compounds, drugs for diseases, and effectiveness for herbs. We selected four well-known LLMs, including GPT-3.5turbo, GPT-4, ChatGLM-std, and LLaMA2-13B, to quantify the quality of biomedical knowledge in LLMs. Comprehensive experiments that include overall evaluation of accuracy and completeness, ablation analysis, few-shot prompt optimization and case study fully benchmarked the performance of LLMs in the identification of biomedical knowledge and assessed the quality of biomedical knowledge implicit in LLMs. Experimental results showed some interesting observations, e.g., the incompleteness and bias of knowledge of different LLMs, which will give us some insight into LLMs for biomedical discovery and application.
引用
收藏
页码:482 / 495
页数:14
相关论文
共 50 条
  • [31] Large language models encode clinical knowledge
    Singhal, Karan
    Azizi, Shekoofeh
    Tu, Tao
    Mahdavi, S. Sara
    Wei, Jason
    Chung, Hyung Won
    Scales, Nathan
    Tanwani, Ajay
    Cole-Lewis, Heather
    Pfohl, Stephen
    Payne, Perry
    Seneviratne, Martin
    Gamble, Paul
    Kelly, Chris
    Babiker, Abubakr
    Schaerli, Nathanael
    Chowdhery, Aakanksha
    Mansfield, Philip
    Demner-Fushman, Dina
    Arcas, Blaise Aguera y
    Webster, Dale
    Corrado, Greg S.
    Matias, Yossi
    Chou, Katherine
    Gottweis, Juraj
    Tomasev, Nenad
    Liu, Yun
    Rajkomar, Alvin
    Barral, Joelle
    Semturs, Christopher
    Karthikesalingam, Alan
    Natarajan, Vivek
    NATURE, 2023, 620 (7972) : 172 - +
  • [32] Debiasing Large Language Models with Structured Knowledge
    Ma, Congda
    Zhao, Tianyu
    Okumura, Manabu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 10274 - 10287
  • [33] Large language models encode clinical knowledge
    Karan Singhal
    Shekoofeh Azizi
    Tao Tu
    S. Sara Mahdavi
    Jason Wei
    Hyung Won Chung
    Nathan Scales
    Ajay Tanwani
    Heather Cole-Lewis
    Stephen Pfohl
    Perry Payne
    Martin Seneviratne
    Paul Gamble
    Chris Kelly
    Abubakr Babiker
    Nathanael Schärli
    Aakanksha Chowdhery
    Philip Mansfield
    Dina Demner-Fushman
    Blaise Agüera y Arcas
    Dale Webster
    Greg S. Corrado
    Yossi Matias
    Katherine Chou
    Juraj Gottweis
    Nenad Tomasev
    Yun Liu
    Alvin Rajkomar
    Joelle Barral
    Christopher Semturs
    Alan Karthikesalingam
    Vivek Natarajan
    Nature, 2023, 620 : 172 - 180
  • [34] Do large language models "understand" their knowledge?
    Venkatasubramanian, Venkat
    AICHE JOURNAL, 2025, 71 (03)
  • [35] Evaluating Intelligence and Knowledge in Large Language Models
    Bianchini, Francesco
    TOPOI-AN INTERNATIONAL REVIEW OF PHILOSOPHY, 2025, 44 (01): : 163 - 173
  • [36] Statistical Knowledge Assessment for Large Language Models
    Dong, Qingxiu
    Xu, Jingjing
    Kong, Lingpeng
    Sui, Zhifang
    Li, Lei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [37] Knowledge Editing for Large Language Models: A Survey
    Wang, Song
    Zhu, Yaochen
    Liu, Haochen
    Zheng, Zaiyi
    Chen, Chen
    Li, Jundong
    ACM COMPUTING SURVEYS, 2025, 57 (03)
  • [38] Comparing the dental knowledge of large language models
    Tussie, Camila
    Starosta, Abraham
    BRITISH DENTAL JOURNAL, 2024,
  • [39] Benchmarking Large Language Models on CFLUE - A Chinese Financial Language Understanding Evaluation Dataset
    Zhu, Jie
    Li, Junhui
    Wen, Yalong
    Guo, Lifan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5673 - 5693
  • [40] BioInstruct: instruction tuning of large language models for biomedical natural language processing
    Tran, Hieu
    Yang, Zhichao
    Yao, Zonghai
    Yu, Hong
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 1821 - 1832