Evaluating and Enhancing Large Language Models'Performancein Domain-Specific Medicine:Development and Usability StudyWith DocOA

被引:0
|
作者
Chen, Xi [1 ,2 ,3 ]
Wang, Li [1 ,2 ,3 ]
You, Mingke [1 ,2 ,3 ]
Liu, Weizhi [1 ,2 ,3 ]
Fu, Yu [4 ]
Xu, Jie [5 ]
Zhang, Shaoting [5 ]
Chen, Gang [2 ,3 ]
Li, Kang [5 ,6 ,7 ]
Li, Jian [1 ,2 ,3 ]
机构
[1] Sichuan Univ, West China Hosp, Sports Med Ctr, 37,Guoxue Alley, Chengdu 610041, Peoples R China
[2] Sichuan Univ, Dept Orthoped, Chengdu, Peoples R China
[3] Sichuan Univ, West China Hosp, Orthoped Res Inst, Chengdu, Peoples R China
[4] Sichuan Univ, West China Hosp, West China Sch Med, Chengdu, Peoples R China
[5] Shanghai Artificial Intelligence Lab, OpenMedLab, Shanghai, Peoples R China
[6] Sichuan Univ, West China Hosp, West China Biomed Big Data Ctr, Chengdu, Peoples R China
[7] Sichuan Univ, Med X Ctr Informat, Chengdu, Peoples R China
关键词
large language model; retrieval-augmented generation; domain-specific benchmark framework; osteoarthritis managemen; MANAGEMENT; HIP;
D O I
暂无
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: The efficacy of large language models (LLMs) in domain-specific medicine, particularly for managing complex diseases such as osteoarthritis (OA), remains largely unexplored. Objective: This study focused on evaluating and enhancing the clinical capabilities and explain ability of LLMs in specific domains, using OA management as a case study. Methods: A domain-specific benchmark framework was developed to evaluate LLMs across a spectrum from domain-specific knowledge to clinical applications in real-world clinical scenarios. DocOA, a specialized LLM designed for OA management integrating retrieval-augmented generation and instructional prompts, was developed. It can identify the clinical evidence upon which its answers are based through retrieval-augmented generation, thereby demonstrating the explain ability of those answers. The study compared the performance of GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human evaluations. Results: Results showed that general LLMs such as GPT-3.5 and GPT-4 were less effective in the specialized domain of OA management, particularly in providing personalized treatment recommendations. However, DocOA showed significant improvements. Conclusions: This study introduces a novel benchmark framework that assesses the domain-specific abilities of LLMs in multiple aspects, highlights the limitations of generalized LLMs in clinical contexts, and demonstrates the potential of tailored approaches for developing domain-specific medical LLMs
引用
收藏
页数:13
相关论文
共 50 条
  • [31] On the use of a domain-specific modeling language in the development of multiagent systems
    Challenger, Moharram
    Demirkol, Sebla
    Getir, Sinem
    Mernik, Marjan
    Kardas, Geylani
    Kosar, Tomaz
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2014, 28 : 111 - 141
  • [32] RADENN: A Domain-Specific Language for the Rapid Development of Neural Networks
    Pineda, Israel
    Carrion-Ojeda, Dustin
    Fonseca-Delgado, Rigoberto
    IEEE ACCESS, 2023, 11 : 86727 - 86738
  • [33] Systematic mapping study on domain-specific language development tools
    Aníbal Iung
    João Carbonell
    Luciano Marchezan
    Elder Rodrigues
    Maicon Bernardino
    Fabio Paulo Basso
    Bruno Medeiros
    Empirical Software Engineering, 2020, 25 : 4205 - 4249
  • [34] An unsupervised incremental learning algorithm for domain-specific language development
    Javed, Faizan
    Mernik, Marjan
    Bryant, Barrett R.
    Sprague, Alan
    APPLIED ARTIFICIAL INTELLIGENCE, 2008, 22 (7-8) : 707 - 729
  • [35] Towards a Domain-Specific Language for Behaviour-Driven Development
    Silva, Thiago Rocha
    2023 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING, VL/HCC, 2023, : 283 - 286
  • [36] Systematic mapping study on domain-specific language development tools
    Iung, Anibal
    Carbonell, Joao
    Marchezan, Luciano
    Rodrigues, Elder
    Bernardino, Maicon
    Basso, Fabio Paulo
    Medeiros, Bruno
    EMPIRICAL SOFTWARE ENGINEERING, 2020, 25 (05) : 4205 - 4249
  • [37] Development of a prototype Domain-Specific Language for monitor and control systems
    Bennett, Matthew
    Borgen, Richard
    Havelund, Klaus
    Ingham, Michel
    Wagner, David
    2008 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2008, : 4206 - +
  • [38] Large language model and domain-specific model collaboration for smart education
    Luo, Yawei
    Yang, Yi
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2024, 25 (03) : 333 - 341
  • [39] Trellis: A Domain-Specific Language for Hidden Markov Models with Sparse Transitions
    Hummelgren, Lars
    Palmkvist, Viktor
    Stjerna, Linnea
    Xu, Xuechun
    Jalden, Joakim
    Broman, David
    PROCEEDINGS OF THE 17TH ACM SIGPLAN INTERNATIONAL CONFERENCE ON SOFTWARE LANGUAGE ENGINEERING, SLE 2024, 2024, : 196 - 209
  • [40] Domain-specific language models training methodology for the in-car infotainment
    Ondas S.
    Gurcik M.
    Ondas, Stanislav (stanislav.ondas@tuke.sk), 1600, IOS Press BV (11): : 417 - 422