Large language models facilitate the generation of electronic health record phenotyping algorithms

被引：7

作者：

Yan, Chao ^{[1
]}

Ong, Henry H. ^{[1
]}

Grabowska, Monika E. ^{[1
]}

Krantz, Matthew S. ^{[1
]}

Su, Wu-Chen ^{[1
]}

Dickson, Alyson L. ^{[1
,2
]}

Peterson, Josh F. ^{[1
,2
]}

Feng, QiPing ^{[2
]}

Roden, Dan M. ^{[1
]}

Stein, C. Michael ^{[2
]}

Kerchberger, V. Eric ^{[2
]}

Malin, Bradley A. ^{[1
,3
,4
]}

Wei, Wei-Qi ^{[1
,3
,5
]}

机构：

[1] Vanderbilt Univ, Dept Biomed Informat, Med Ctr, Nashville, TN 37203 USA

[2] Vanderbilt Univ, Dept Med, Med Ctr, Nashville, TN 37203 USA

[3] Vanderbilt Univ, Dept Comp Sci, Nashville, TN 37203 USA

[4] Vanderbilt Univ, Dept Biostat, Med Ctr, Nashville, TN 37203 USA

[5] Vanderbilt Univ, Med Ctr, Dept Biomed Informat & Comp Sci, Suite 1500,2525 West End Ave, Nashville, TN 37203 USA

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2024年 / 31卷 / 09期

关键词：

phenotyping; electronic health records; large language models; ChatGPT; MEDICAL-RECORDS;

D O I：

10.1093/jamia/ocae072

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objectives Phenotyping is a core task in observational health research utilizing electronic health records (EHRs). Developing an accurate algorithm demands substantial input from domain experts, involving extensive literature review and evidence synthesis. This burdensome process limits scalability and delays knowledge discovery. We investigate the potential for leveraging large language models (LLMs) to enhance the efficiency of EHR phenotyping by generating high-quality algorithm drafts.Materials and Methods We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (ie, type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network.Results GPT-4 and GPT-3.5 exhibited significantly higher overall expert evaluation scores in instruction following, algorithmic logic, and SQL executability, when compared to Claude 2 and Bard. Although GPT-4 and GPT-3.5 effectively identified relevant clinical concepts, they exhibited immature capability in organizing phenotyping criteria with the proper logic, leading to phenotyping algorithms that were either excessively restrictive (with low recall) or overly broad (with low positive predictive values).Conclusion GPT versions 3.5 and 4 are capable of drafting phenotyping algorithms by identifying relevant clinical criteria aligned with a CDM. However, expertise in informatics and clinical experience is still required to assess and further refine generated algorithms.

引用

页码：1994 / 2001

页数：8

共 50 条

[21] ITERATING TOWARDS PRECISION PHENOTYPING OF SCHIZOPHRENIA IN THE ELECTRONIC HEALTH RECORD
Lake, Allison M.
Reddy, India A.
Straub, Peter
Davis, Lea K.
EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2023, 75 : S213 - S213
[22] sureLDA: A multidisease automated phenotyping method for the electronic health record
Ahuja, Yuri
Zhou, Doudou
He, Zeling
Sun, Jiehuan
Castro, Victor M.
Gainer, Vivian
Murphy, Shawn N.
Hong, Chuan
Cai, Tianxi
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (08) : 1235 - 1243
[23] The shaky foundations of large language models and foundation models for electronic health records
Michael Wornow
Yizhe Xu
Rahul Thapa
Birju Patel
Ethan Steinberg
Scott Fleming
Michael A. Pfeffer
Jason Fries
Nigam H. Shah
npj Digital Medicine, 6
[24] The shaky foundations of large language models and foundation models for electronic health records
Wornow, Michael
Xu, Yizhe
Thapa, Rahul
Patel, Birju
Steinberg, Ethan
Fleming, Scott
Pfeffer, Michael A.
Fries, Jason
Shah, Nigam H.
NPJ DIGITAL MEDICINE, 2023, 6 (01)
[25] Language models are an effective representation learning technique for electronic health record data
Steinberg, Ethan
Jung, Ken
Fries, Jason A.
Corbin, Conor K.
Pfohl, Stephen R.
Shah, Nigam H.
Journal of Biomedical Informatics, 2021, 113
[26] Language models are an effective representation learning technique for electronic health record data
Steinberg, Ethan
Jung, Ken
Fries, Jason A.
Corbin, Conor K.
Pfohl, Stephen R.
Shah, Nigam H.
JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 113
[27] Estimating summary statistics for electronic health record laboratory data for use in high-throughput phenotyping algorithms
Albers, D. J.
Elhadad, N.
Claassen, J.
Perotte, R.
Goldstein, A.
Hripcsak, G.
JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 78 : 87 - 101
[28] Bleeding in cardiac patients prescribed antithrombotic drugs: electronic health record phenotyping algorithms, incidence, trends and prognosis
Laura Pasea
Sheng-Chia Chung
Mar Pujades-Rodriguez
Anoop D. Shah
Samantha Alvarez-Madrazo
Victoria Allan
James T. Teo
Daniel Bean
Reecha Sofat
Richard Dobson
Amitava Banerjee
Riyaz S. Patel
Adam Timmis
Spiros Denaxas
Harry Hemingway
BMC Medicine, 17
[29] Bleeding in cardiac patients prescribed antithrombotic drugs: electronic health record phenotyping algorithms, incidence, trends and prognosis
Pasea, Laura
Chung, Sheng-Chia
Pujades-Rodriguez, Mar
Shah, Anoop D.
Alvarez-Madrazo, Samantha
Allan, Victoria
Teo, James T.
Bean, Daniel
Sofat, Reecha
Dobson, Richard
Banerjee, Amitava
Patel, Riyaz S.
Timmis, Adam
Denaxas, Spiros
Hemingway, Harry
BMC MEDICINE, 2019, 17 (01)
[30] Large language models to identify social determinants of health in electronic health records
Guevara, Marco
Chen, Shan
Thomas, Spencer
Chaunzwa, Tafadzwa L.
Franco, Idalid
Kann, Benjamin H.
Moningi, Shalini
Qian, Jack M.
Goldstein, Madeleine
Harper, Susan
Aerts, Hugo J. W. L.
Catalano, Paul J.
Savova, Guergana K.
Mak, Raymond H.
Bitterman, Danielle S.
NPJ DIGITAL MEDICINE, 2024, 7 (01)

← 1 2 3 4 5 →