Large language models facilitate the generation of electronic health record phenotyping algorithms

被引:7
|
作者
Yan, Chao [1 ]
Ong, Henry H. [1 ]
Grabowska, Monika E. [1 ]
Krantz, Matthew S. [1 ]
Su, Wu-Chen [1 ]
Dickson, Alyson L. [1 ,2 ]
Peterson, Josh F. [1 ,2 ]
Feng, QiPing [2 ]
Roden, Dan M. [1 ]
Stein, C. Michael [2 ]
Kerchberger, V. Eric [2 ]
Malin, Bradley A. [1 ,3 ,4 ]
Wei, Wei-Qi [1 ,3 ,5 ]
机构
[1] Vanderbilt Univ, Dept Biomed Informat, Med Ctr, Nashville, TN 37203 USA
[2] Vanderbilt Univ, Dept Med, Med Ctr, Nashville, TN 37203 USA
[3] Vanderbilt Univ, Dept Comp Sci, Nashville, TN 37203 USA
[4] Vanderbilt Univ, Dept Biostat, Med Ctr, Nashville, TN 37203 USA
[5] Vanderbilt Univ, Med Ctr, Dept Biomed Informat & Comp Sci, Suite 1500,2525 West End Ave, Nashville, TN 37203 USA
关键词
phenotyping; electronic health records; large language models; ChatGPT; MEDICAL-RECORDS;
D O I
10.1093/jamia/ocae072
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objectives Phenotyping is a core task in observational health research utilizing electronic health records (EHRs). Developing an accurate algorithm demands substantial input from domain experts, involving extensive literature review and evidence synthesis. This burdensome process limits scalability and delays knowledge discovery. We investigate the potential for leveraging large language models (LLMs) to enhance the efficiency of EHR phenotyping by generating high-quality algorithm drafts.Materials and Methods We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (ie, type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network.Results GPT-4 and GPT-3.5 exhibited significantly higher overall expert evaluation scores in instruction following, algorithmic logic, and SQL executability, when compared to Claude 2 and Bard. Although GPT-4 and GPT-3.5 effectively identified relevant clinical concepts, they exhibited immature capability in organizing phenotyping criteria with the proper logic, leading to phenotyping algorithms that were either excessively restrictive (with low recall) or overly broad (with low positive predictive values).Conclusion GPT versions 3.5 and 4 are capable of drafting phenotyping algorithms by identifying relevant clinical criteria aligned with a CDM. However, expertise in informatics and clinical experience is still required to assess and further refine generated algorithms.
引用
收藏
页码:1994 / 2001
页数:8
相关论文
共 50 条
  • [41] Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study
    Noori, Ayush
    Magdamo, Colin
    Liu, Xiao
    Tyagi, Tanish
    Li, Zhaozhi
    Kondepudi, Akhil
    Alabsi, Haitham
    Rudmann, Emily
    Wilcox, Douglas
    Brenner, Laura
    Robbins, Gregory K.
    Moura, Lidia
    Zafar, Sahar
    Benson, Nicole M.
    Hsu, John
    Dickson, John R.
    Serrano-Pozo, Alberto
    Hyman, Bradley
    Blacker, Deborah
    Westover, M. Brandon
    Mukerji, Shibani S.
    Das, Sudeshna
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2022, 24 (08)
  • [42] Privacy preserving strategies for electronic health records in the era of large language models
    Jonnagaddala, Jitendra
    Wong, Zoie Shui-Yee
    NPJ DIGITAL MEDICINE, 2025, 8 (01):
  • [43] LEVERAGING THE ELECTRONIC HEALTH RECORD TO FACILITATE SHARED-MEDICAL DECISION MAKING IN A LARGE HEALTH SYSTEM: RESPONSE TO THE PAP RECALL
    Lance, Colleen
    Wang, Lu
    Foldvary-Schaefer, Nancy
    Wyllie, Robert
    Baugh, Kristin
    Kazaglis, Louis
    Carroll, Don
    Colwell, Dawn
    Hall, Greg
    Mehra, Reena
    SLEEP, 2022, 45 : A165 - A166
  • [44] The use of an electronic health record to facilitate communication ofadditional findings in families
    Ballard, L.
    Sandi, D.
    Fenwick, A.
    Lucassen, A.
    Lucassen, A.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 811 - 812
  • [45] ELECTRONIC HEALTH RECORD PHENOTYPING ALGORITHM IDENTIFIES PATIENTS WITH ADVANCED FIBROSIS IN A LARGE COHORT WITH NONALCOHOLIC FATTY LIVER DISEASE
    Basile, Anna
    Anyanwu-Ofili, Anuli
    Rajagopal, Gunaretnam
    Farrell, Ava
    Destin, Brittney
    Krikhely, Abraham
    Bessler, Marc
    Tatonetti, Nicholas
    Wattacheril, Julia J.
    HEPATOLOGY, 2019, 70 : 1137A - 1137A
  • [46] Bias of Inaccurate Disease Mentions in Electronic Health Record-based Phenotyping
    Kagawa, Rina
    Shinohara, Emiko
    Imari, Takeshi
    Kawazoe, Yoshimasa
    Ohe, Kazuhiko
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 124 : 90 - 96
  • [47] UTILIZING LARGE LANGUAGE MODELS FOR DISEASE PHENOTYPING IN OBSTRUCTIVE SLEEP APNEA
    Khurram, Ifrah
    Zamora-Resendiz, Rafael
    Morrow, Destinee
    Crivelli, Silvia
    CRITICAL CARE MEDICINE, 2024, 52
  • [48] Validation of knee osteoarthritis case identi fi cation algorithms in a large electronic health record database
    Yau, Michelle S.
    Dubreuil, Maureen
    Li, Shanshan
    Inamdar, Vibha
    Peloquin, Christine
    Felson, David T.
    OSTEOARTHRITIS AND CARTILAGE OPEN, 2022, 4 (01):
  • [49] Scalable Phenotyping of Heart Failure Across Multicenter, Non-Interoperable Health Systems Using Retrieval-Augmented Generation and Large Language Models
    Shankar, Sumukh Vasisht
    Thangaraj, Phyllis
    Adejumo, Philip
    Khera, Rohan
    CIRCULATION, 2024, 150
  • [50] Natural language generation for electronic health records
    Lee, Scott H.
    NPJ DIGITAL MEDICINE, 2018, 1