Large language models facilitate the generation of electronic health record phenotyping algorithms

被引：7

作者：

Yan, Chao ^{[1
]}

Ong, Henry H. ^{[1
]}

Grabowska, Monika E. ^{[1
]}

Krantz, Matthew S. ^{[1
]}

Su, Wu-Chen ^{[1
]}

Dickson, Alyson L. ^{[1
,2
]}

Peterson, Josh F. ^{[1
,2
]}

Feng, QiPing ^{[2
]}

Roden, Dan M. ^{[1
]}

Stein, C. Michael ^{[2
]}

Kerchberger, V. Eric ^{[2
]}

Malin, Bradley A. ^{[1
,3
,4
]}

Wei, Wei-Qi ^{[1
,3
,5
]}

机构：

[1] Vanderbilt Univ, Dept Biomed Informat, Med Ctr, Nashville, TN 37203 USA

[2] Vanderbilt Univ, Dept Med, Med Ctr, Nashville, TN 37203 USA

[3] Vanderbilt Univ, Dept Comp Sci, Nashville, TN 37203 USA

[4] Vanderbilt Univ, Dept Biostat, Med Ctr, Nashville, TN 37203 USA

[5] Vanderbilt Univ, Med Ctr, Dept Biomed Informat & Comp Sci, Suite 1500,2525 West End Ave, Nashville, TN 37203 USA

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2024年 / 31卷 / 09期

关键词：

phenotyping; electronic health records; large language models; ChatGPT; MEDICAL-RECORDS;

D O I：

10.1093/jamia/ocae072

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objectives Phenotyping is a core task in observational health research utilizing electronic health records (EHRs). Developing an accurate algorithm demands substantial input from domain experts, involving extensive literature review and evidence synthesis. This burdensome process limits scalability and delays knowledge discovery. We investigate the potential for leveraging large language models (LLMs) to enhance the efficiency of EHR phenotyping by generating high-quality algorithm drafts.Materials and Methods We prompted four LLMs-GPT-4 and GPT-3.5 of ChatGPT, Claude 2, and Bard-in October 2023, asking them to generate executable phenotyping algorithms in the form of SQL queries adhering to a common data model (CDM) for three phenotypes (ie, type 2 diabetes mellitus, dementia, and hypothyroidism). Three phenotyping experts evaluated the returned algorithms across several critical metrics. We further implemented the top-rated algorithms and compared them against clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network.Results GPT-4 and GPT-3.5 exhibited significantly higher overall expert evaluation scores in instruction following, algorithmic logic, and SQL executability, when compared to Claude 2 and Bard. Although GPT-4 and GPT-3.5 effectively identified relevant clinical concepts, they exhibited immature capability in organizing phenotyping criteria with the proper logic, leading to phenotyping algorithms that were either excessively restrictive (with low recall) or overly broad (with low positive predictive values).Conclusion GPT versions 3.5 and 4 are capable of drafting phenotyping algorithms by identifying relevant clinical criteria aligned with a CDM. However, expertise in informatics and clinical experience is still required to assess and further refine generated algorithms.

引用

页码：1994 / 2001

页数：8

共 50 条

[31] Large language models to identify social determinants of health in electronic health records
Marco Guevara
Shan Chen
Spencer Thomas
Tafadzwa L. Chaunzwa
Idalid Franco
Benjamin H. Kann
Shalini Moningi
Jack M. Qian
Madeleine Goldstein
Susan Harper
Hugo J. W. L. Aerts
Paul J. Catalano
Guergana K. Savova
Raymond H. Mak
Danielle S. Bitterman
npj Digital Medicine, 7
[32] ELECTRONIC HEALTH RECORD ALGORITHMS TO DETECT PAD
Jones, William Schuyler
Lippman, Steven
Smerek, Michelle
Shah, Kuntal
Ward, Rachael
Brock, Adam
Sullivan, Robert Casey
Long, Chandler
Vemulapalli, Sreekanth
Patel, Manesh
Greiner, Melissa
Hardy, Chantelle
Curtis, Lesley
JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2018, 71 (11) : 2036 - 2036
[33] Next-generation phenotyping of electronic health records
Hripcsak, George
Albers, David J.
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (01) : 117 - 121
[34] Examination of Stigmatizing Language in the Electronic Health Record
Himmelstein, Gracie
Bates, David
Zhou, Li
JAMA NETWORK OPEN, 2022, 5 (01)
[35] Using norms to facilitate the multiple functions of the electronic health record
Rooksby, J
Kay, S
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2000, : 1123 - 1123
[36] Large Language Models and the Degradation of the Medical Record
McCoy, Liam G.
Manrai, Arjun K.
Rodman, Adam
NEW ENGLAND JOURNAL OF MEDICINE, 2024, 391 (17): : 1561 - 1564
[37] Validation of Electronic Health Record Phenotyping of Bipolar Disorder Cases and Controls
Castro, Victor M.
Minnier, Jessica
Murphy, Shawn N.
Kohane, Isaac
Churchill, Susanne E.
Gainer, Vivian
Cai, Tianxi
Hoffnagle, Alison G.
Dai, Yael
Block, Stefanie
Weill, Sydney R.
Nadal-Vicens, Mireya
Pollastri, Alisha R.
Rosenquist, J. Niels
Goryachev, Sergey
Ongur, Dost
Sklar, Pamela
Perlis, Roy H.
Smoller, Jordan W.
AMERICAN JOURNAL OF PSYCHIATRY, 2015, 172 (04): : 363 - 372
[38] Relational machine learning for electronic health record-driven phenotyping
Peissig, Peggy L.
Costa, Vitor Santos
Caldwell, Michael D.
Rottscheit, Carla
Berg, Richard L.
Mendonca, Eneida A.
Page, David
JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 52 : 260 - 270
[39] Concept libraries for automatic electronic health record based phenotyping: A review
Almowil, Zahra A.
Zhou, Shang-Ming
Brophy, Sinead
INTERNATIONAL JOURNAL OF POPULATION DATA SCIENCE (IJPDS), 2021, 6 (01):
[40] A hybrid framework with large language models for rare disease phenotyping
Wu, Jinge
Dong, Hang
Li, Zexi
Wang, Haowei
Li, Runci
Patra, Arijit
Dai, Chengliang
Ali, Waqar
Scordis, Phil
Wu, Honghan
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)

← 1 2 3 4 5 →