Benchmarking Biomedical Relation Knowledge in Large Language Models

被引:0
|
作者
Zhang, Fenghui [1 ]
Yang, Kuo [1 ]
Zhao, Chenqian [1 ]
Li, Haixu [1 ]
Dong, Xin [1 ]
Tian, Haoyu [1 ]
Zhou, Xuezhong [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp Sci & Technol, Beijing Key Lab Traff Data Anal & Min, Inst Med Intelligence, Beijing 100044, Peoples R China
基金
中国国家自然科学基金;
关键词
biomedical knowledge evaluation; large language model; biomedical relationship identification; benchmarking;
D O I
10.1007/978-981-97-5131-0_41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a special knowledge base (KB), a large language model (LLM) stores a great deal of knowledge in the form of the parametric deep neural network, and evaluating the accuracy of the knowledge within this KB has emerged as a key area of interest in LLM research. Although lots of evaluation studies of LLM knowledge have been carried out, due to the complexity and scarcity of biomedical knowledge, there are still few evaluation studies on this kind of knowledge. To address this, we designed five specific identification and evaluation tasks for the biomedical knowledge in LLMs, including the identification of genes for diseases, targets for drugs/compounds, drugs for diseases, and effectiveness for herbs. We selected four well-known LLMs, including GPT-3.5turbo, GPT-4, ChatGLM-std, and LLaMA2-13B, to quantify the quality of biomedical knowledge in LLMs. Comprehensive experiments that include overall evaluation of accuracy and completeness, ablation analysis, few-shot prompt optimization and case study fully benchmarked the performance of LLMs in the identification of biomedical knowledge and assessed the quality of biomedical knowledge implicit in LLMs. Experimental results showed some interesting observations, e.g., the incompleteness and bias of knowledge of different LLMs, which will give us some insight into LLMs for biomedical discovery and application.
引用
收藏
页码:482 / 495
页数:14
相关论文
共 50 条
  • [1] Benchmarking large language models for biomedical natural language processing applications and recommendations
    Chen, Qingyu
    Hu, Yan
    Peng, Xueqing
    Xie, Qianqian
    Jin, Qiao
    Gilson, Aidan
    Singer, Maxwell B.
    Ai, Xuguang
    Lai, Po-Ting
    Wang, Zhizheng
    Keloth, Vipina K.
    Raja, Kalpana
    Huang, Jimin
    He, Huan
    Lin, Fongci
    Du, Jingcheng
    Zhang, Rui
    Zheng, W. Jim
    Adelman, Ron A.
    Lu, Zhiyong
    Xu, Hua
    NATURE COMMUNICATIONS, 2025, 16 (01)
  • [2] Enhancing Relation Extraction from Biomedical Texts by Large Language Models
    Asada, Masaki
    Fukuda, Ken
    ARTIFICIAL INTELLIGENCE IN HCI, PT III, AI-HCI 2024, 2024, 14736 : 3 - 14
  • [3] Benchmarking medical large language models
    Bakhshandeh, Sadra
    NATURE REVIEWS BIOENGINEERING, 2023, 1 (08): : 543 - 543
  • [4] Rationalism in the face of GPT hypes: Benchmarking the output of large language models against human expert-curated biomedical knowledge graphs
    Babaiha, Negin Sadat
    Rao, Sathvik Guru
    Klein, Juergen
    Schultz, Bruce
    Jacobs, Marc
    Hofmann-Apitius, Martin
    ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES, 2024, 5
  • [5] From language models to large-scale food and biomedical knowledge graphs
    Gjorgjina Cenikj
    Lidija Strojnik
    Risto Angelski
    Nives Ogrinc
    Barbara Koroušić Seljak
    Tome Eftimov
    Scientific Reports, 13
  • [6] From language models to large-scale food and biomedical knowledge graphs
    Cenikj, Gjorgjina
    Strojnik, Lidija
    Angelski, Risto
    Ogrinc, Nives
    Seljak, Barbara Korousic
    Eftimov, Tome
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [7] Biomedical knowledge graph-optimized prompt generation for large language models
    Soman, Karthik
    Rose, Peter W.
    Morris, John H.
    Akbas, Rabia E.
    Smith, Brett
    Peetoom, Braian
    Villouta-Reyes, Catalina
    Cerono, Gabriel
    Shi, Yongmei
    Rizk-Jackson, Angela
    Israni, Sharat
    Nelson, Charlotte A.
    Huang, Sui
    Baranzini, Sergio E.
    BIOINFORMATICS, 2024, 40 (09)
  • [8] Benchmarking DNA large language models on quadruplexes
    Cherednichenko, Oleksandr
    Herbert, Alan
    Poptsova, Maria
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2025, 27 : 992 - 1000
  • [9] Benchmarking AutoGen with different large language models
    Barbarroxa, Rafael
    Ribeiro, Bruno
    Gomes, Luis
    Vale, Zita
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 263 - 264
  • [10] Benchmarking Large Language Models for News Summarization
    Zhang, Tianyi
    Ladhak, Faisal
    Durmus, Esin
    Liang, Percy
    Mckeown, Kathleen
    Hashimoto, Tatsunori B.
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 39 - 57