Evaluating and Enhancing Large Language Models'Performancein Domain-Specific Medicine:Development and Usability StudyWith DocOA

被引:0
|
作者
Chen, Xi [1 ,2 ,3 ]
Wang, Li [1 ,2 ,3 ]
You, Mingke [1 ,2 ,3 ]
Liu, Weizhi [1 ,2 ,3 ]
Fu, Yu [4 ]
Xu, Jie [5 ]
Zhang, Shaoting [5 ]
Chen, Gang [2 ,3 ]
Li, Kang [5 ,6 ,7 ]
Li, Jian [1 ,2 ,3 ]
机构
[1] Sichuan Univ, West China Hosp, Sports Med Ctr, 37,Guoxue Alley, Chengdu 610041, Peoples R China
[2] Sichuan Univ, Dept Orthoped, Chengdu, Peoples R China
[3] Sichuan Univ, West China Hosp, Orthoped Res Inst, Chengdu, Peoples R China
[4] Sichuan Univ, West China Hosp, West China Sch Med, Chengdu, Peoples R China
[5] Shanghai Artificial Intelligence Lab, OpenMedLab, Shanghai, Peoples R China
[6] Sichuan Univ, West China Hosp, West China Biomed Big Data Ctr, Chengdu, Peoples R China
[7] Sichuan Univ, Med X Ctr Informat, Chengdu, Peoples R China
关键词
large language model; retrieval-augmented generation; domain-specific benchmark framework; osteoarthritis managemen; MANAGEMENT; HIP;
D O I
暂无
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: The efficacy of large language models (LLMs) in domain-specific medicine, particularly for managing complex diseases such as osteoarthritis (OA), remains largely unexplored. Objective: This study focused on evaluating and enhancing the clinical capabilities and explain ability of LLMs in specific domains, using OA management as a case study. Methods: A domain-specific benchmark framework was developed to evaluate LLMs across a spectrum from domain-specific knowledge to clinical applications in real-world clinical scenarios. DocOA, a specialized LLM designed for OA management integrating retrieval-augmented generation and instructional prompts, was developed. It can identify the clinical evidence upon which its answers are based through retrieval-augmented generation, thereby demonstrating the explain ability of those answers. The study compared the performance of GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human evaluations. Results: Results showed that general LLMs such as GPT-3.5 and GPT-4 were less effective in the specialized domain of OA management, particularly in providing personalized treatment recommendations. However, DocOA showed significant improvements. Conclusions: This study introduces a novel benchmark framework that assesses the domain-specific abilities of LLMs in multiple aspects, highlights the limitations of generalized LLMs in clinical contexts, and demonstrates the potential of tailored approaches for developing domain-specific medical LLMs
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Grammar Prompting for Domain-Specific Language Generation with Large Language Models
    Wang, Bailin
    Wang, Zi
    Wang, Xuezhi
    Cao, Yuan
    Saurous, Rif A.
    Kim, Yoon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] On the Effectiveness of Large Language Models in Domain-Specific Code Generation
    Gu, Xiaodong
    Chen, Meng
    Lin, Yalan
    Hu, Yuhan
    Zhang, Hongyu
    Wan, Chengcheng
    Wei, Zhao
    Xu, Yong
    Wang, Juhong
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (03)
  • [3] Large language models as oracles for instantiating ontologies with domain-specific knowledge
    Ciatto, Giovanni
    Agiollo, Andrea
    Magnini, Matteo
    Omicini, Andrea
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [4] FAMILIAR: A domain-specific language for large scale management of feature models
    Acher, Mathieu
    Collet, Philippe
    Lahire, Philippe
    France, Robert B.
    SCIENCE OF COMPUTER PROGRAMMING, 2013, 78 (06) : 657 - 681
  • [5] Generating Domain-Specific Programs for Diagram Authoring with Large Language Models
    Jain, Rijul
    Ni, Wode
    Sunshine, Joshua
    COMPANION PROCEEDINGS OF THE 2023 ACM SIGPLAN INTERNATIONAL CONFERENCE ON SYSTEMS, PROGRAMMING, LANGUAGES, AND APPLICATIONS: SOFTWARE FOR HUMANITY, SPLASH COMPANION 2023, 2023, : 70 - 71
  • [6] Usability of a Domain-Specific Language for a Gesture-Driven IDE
    Bacikova, Michaela
    Maricak, Martin
    Vancik, Matej
    PROCEEDINGS OF THE 2015 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 5 : 909 - 914
  • [7] A domain-specific language for models of landscape dynamics
    Fall, A
    Fall, J
    ECOLOGICAL MODELLING, 2001, 141 (1-3) : 1 - 18
  • [8] Conceptual language models for domain-specific retrieval
    Meij, Edgar
    Trieschnigg, Dolf
    de Rijke, Maarten
    Kraaij, Wessel
    INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (04) : 448 - 469
  • [9] Domain-specific language models and lexicons for tagging
    Coden, AR
    Pakhomov, SV
    Ando, RKB
    Duffy, PH
    Chute, CG
    JOURNAL OF BIOMEDICAL INFORMATICS, 2005, 38 (06) : 422 - 430
  • [10] Large Language Models and Rule-Based Approaches in Domain-Specific Communication
    Halvonik, Dominik
    Kapusta, Jozef
    IEEE ACCESS, 2024, 12 : 107046 - 107058