Evaluating and Enhancing Large Language Models'Performancein Domain-Specific Medicine:Development and Usability StudyWith DocOA

被引:0
|
作者
Chen, Xi [1 ,2 ,3 ]
Wang, Li [1 ,2 ,3 ]
You, Mingke [1 ,2 ,3 ]
Liu, Weizhi [1 ,2 ,3 ]
Fu, Yu [4 ]
Xu, Jie [5 ]
Zhang, Shaoting [5 ]
Chen, Gang [2 ,3 ]
Li, Kang [5 ,6 ,7 ]
Li, Jian [1 ,2 ,3 ]
机构
[1] Sichuan Univ, West China Hosp, Sports Med Ctr, 37,Guoxue Alley, Chengdu 610041, Peoples R China
[2] Sichuan Univ, Dept Orthoped, Chengdu, Peoples R China
[3] Sichuan Univ, West China Hosp, Orthoped Res Inst, Chengdu, Peoples R China
[4] Sichuan Univ, West China Hosp, West China Sch Med, Chengdu, Peoples R China
[5] Shanghai Artificial Intelligence Lab, OpenMedLab, Shanghai, Peoples R China
[6] Sichuan Univ, West China Hosp, West China Biomed Big Data Ctr, Chengdu, Peoples R China
[7] Sichuan Univ, Med X Ctr Informat, Chengdu, Peoples R China
关键词
large language model; retrieval-augmented generation; domain-specific benchmark framework; osteoarthritis managemen; MANAGEMENT; HIP;
D O I
暂无
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: The efficacy of large language models (LLMs) in domain-specific medicine, particularly for managing complex diseases such as osteoarthritis (OA), remains largely unexplored. Objective: This study focused on evaluating and enhancing the clinical capabilities and explain ability of LLMs in specific domains, using OA management as a case study. Methods: A domain-specific benchmark framework was developed to evaluate LLMs across a spectrum from domain-specific knowledge to clinical applications in real-world clinical scenarios. DocOA, a specialized LLM designed for OA management integrating retrieval-augmented generation and instructional prompts, was developed. It can identify the clinical evidence upon which its answers are based through retrieval-augmented generation, thereby demonstrating the explain ability of those answers. The study compared the performance of GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human evaluations. Results: Results showed that general LLMs such as GPT-3.5 and GPT-4 were less effective in the specialized domain of OA management, particularly in providing personalized treatment recommendations. However, DocOA showed significant improvements. Conclusions: This study introduces a novel benchmark framework that assesses the domain-specific abilities of LLMs in multiple aspects, highlights the limitations of generalized LLMs in clinical contexts, and demonstrates the potential of tailored approaches for developing domain-specific medical LLMs
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Augmenting Large Language Models via Vector Embeddings to Improve Domain-specific Responsiveness
    Wolfrath, Nathan M.
    Verhagen, Nathaniel B.
    Crotty, Bradley H.
    Somai, Melek
    Kothari, Anai N.
    JOVE-JOURNAL OF VISUALIZED EXPERIMENTS, 2024, (214):
  • [22] Empowering Large Language Models to Leverage Domain-Specific Knowledge in E-Learning
    Lu, Ruei-Shan
    Lin, Ching-Chang
    Tsao, Hsiu-Yuan
    APPLIED SCIENCES-BASEL, 2024, 14 (12):
  • [23] PreparedLLM: effective pre-pretraining framework for domain-specific large language models
    Chen, Zhou
    Lin, Ming
    Wang, Zimeng
    Zang, Mingrun
    Bai, Yuqi
    BIG EARTH DATA, 2024, 8 (04) : 649 - 672
  • [24] Language Models Learning for Domain-Specific Natural Language User Interaction
    Bai, Shuanhu
    Huang, Chien-Lin
    Tan, Yeow-Kee
    Ma, Bin
    2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO 2009), VOLS 1-4, 2009, : 2480 - 2485
  • [25] Enhancing Neural Recommender Models through Domain-Specific Concordance
    Balashankar, Ananth
    Beutel, Alex
    Subramanian, Lakshminarayanan
    WSDM '21: PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2021, : 1002 - 1010
  • [26] Domain-specific Language for Condition Monitoring Software Development
    Pasic, Faruk
    Becker, Matthias
    2022 IEEE 27TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2022,
  • [27] Augmenting Large Language Models with Rules for Enhanced Domain-Specific Interactions: The Case of Medical Diagnosis
    Panagoulias, Dimitrios P.
    Virvou, Maria
    Tsihrintzis, George A.
    ELECTRONICS, 2024, 13 (02)
  • [28] A Domain-specific Language for Automated Fault Injection in SystemC Models
    Lohmann, Douglas
    Huf, Alexis
    Lettnin, Djones
    Siqueira, Frank
    Guntzel, Jose Luis
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2018, : 425 - 428
  • [29] Architecture and Language for Semantic Reduction of Domain-Specific Models in BPMS
    Lace, Lelde
    Liepins, Renars
    Rencis, Edgars
    PERSPECTIVES IN BUSINESS INFORMATICS RESEARCH, BIR 2012, 2012, 128 : 70 - 84
  • [30] Iterative Domain-Specific Language Development with YAJCo Parser Generator
    Jaroslav, Porubaen
    Dominik, Lakatos
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE OF NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2014 (ICNAAM-2014), 2015, 1648