GPT-4 Performance for Neurologic Localization

被引：3

作者：

Lee, Jung-Hyun ^{[1
,2
,3
]}

Choi, Eunhee ^{[4
]}

McDougal, Robert ^{[5
,6
,7
,8
]}

Lytton, William W. ^{[1
,2
,9
]}

机构：

[1] SUNY Downstate Hlth Sci Univ, Dept Neurol, Brooklyn, NY 11203 USA

[2] Kings Cty Hosp, Dept Neurol, Brooklyn, NY 11203 USA

[3] Maimonides Hosp, Dept Neurol, Brooklyn, NY 11219 USA

[4] Lincoln Med Ctr, Dept Internal Med, Bronx, NY USA

[5] Yale Univ, Yale Sch Publ Hlth, Dept Biostat, New Haven, CT USA

[6] Yale Univ, Yale Sch Med, Program Computat Biol & Bioinformat, New Haven, CT USA

[7] Yale Univ, Wu Tsai Inst, Yale Sch Med, New Haven, CT USA

[8] Yale Univ, Yale Sch Med, Sect Biomed Informat & Data Sci, New Haven, CT USA

[9] SUNY Downstate Hlth Sci Univ, Dept Physiol & Pharmacol, Brooklyn, NY USA

来源：

NEUROLOGY-CLINICAL PRACTICE | 2024年 / 14卷 / 03期

关键词：

D O I：

10.1212/CPJ.0000000000200293

中图分类号：

R74 [神经病学与精神病学];

学科分类号：

摘要：

Background and ObjectivesIn health care, large language models such as Generative Pretrained Transformers (GPTs), trained on extensive text datasets, have potential applications in reducing health care disparities across regions and populations. Previous software developed for lesion localization has been limited in scope. This study aims to evaluate the capability of GPT-4 for lesion localization based on clinical presentation.MethodsGPT-4 was prompted using history and neurologic physical examination (H&P) from published cases of acute stroke followed by questions for clinical reasoning with answering for "single or multiple lesions," "side," and "brain region" using Zero-Shot Chain-of-Thought and Text Classification prompting. GPT-4 output on 3 separate trials for each of 46 cases was compared with imaging-based localization.ResultsGPT-4 successfully processed raw text from H&P to generate accurate neuroanatomical localization and detailed clinical reasoning. Performance metrics across trial-based analysis for specificity, sensitivity, precision, and F1-score were 0.87, 0.74, 0.75, and 0.74, respectively, for side; 0.94, 0.85, 0.84, and 0.85, respectively, for brain region. Class labels within the brain region were similarly high for all regions except the cerebellum and were also similar when considering all 3 trials to examine metrics by case. Errors were due to extrinsic causes-inadequate information in the published cases, and intrinsic causes-failures of logic or inadequate knowledge base.DiscussionThis study reveals capabilities of GPT-4 in the localization of acute stroke lesions, showing a potential future role as a clinical tool in neurology.

引用

页数：8

共 50 条

[1] Performance of GPT-4 on Chinese Nursing Examination
Miao, Yiqun
Luo, Yuan
Zhao, Yuhan
Li, Jiawei
Liu, Mingxuan
Wang, Huiying
Chen, Yuling
Wu, Ying
NURSE EDUCATOR, 2024, 49 (06) : E338 - E343
[2] Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment
Revercomb, Lucy
Patel, Aman M.
Fu, Daniel
Filimonov, Andrey
INDIAN JOURNAL OF OTOLARYNGOLOGY AND HEAD & NECK SURGERY, 2024, 76 (06) : 6112 - 6114
[3] Is GPT-4 a reliable rater? Evaluating consistency in GPT-4's text ratings
Hackl, Veronika
Mueller, Alexandra Elena
Granitzer, Michael
Sailer, Maximilian
FRONTIERS IN EDUCATION, 2023, 8
[4] GPT-4 as a biomedical simulator
Schaefer M.
Reichl S.
ter Horst R.
Nicolas A.M.
Krausgruber T.
Piras F.
Stepper P.
Bock C.
Samwald M.
Computers in Biology and Medicine, 2024, 178
[5] Performance of GPT-4 Vision on kidney pathology exam questions
Miao, Jing
Thongprayoon, Charat
Cheungpasitporn, Wisit
Cornell, Lynn D.
AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2024, 162 (03) : 220 - 226
[6] Assessing GPT-4 multimodal performance in radiological image analysis
Brin, Dana
Sorin, Vera
Barash, Yiftach
Konen, Eli
Glicksberg, Benjamin S.
Nadkarni, Girish N.
Klang, Eyal
EUROPEAN RADIOLOGY, 2025, 35 (04) : 1959 - 1965
[7] Performance of GPT-4 Vision on kidney pathology exam questions
Daungsupawong, Hinpetch
Wiwanitkit, Viroj
AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2024,
[8] Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
Kaneda, Yudai
Takahashi, Ryo
Kaneda, Uiri
Akashima, Shiori
Okita, Haruna
Misaki, Sadaya
Yamashiro, Akimi
Ozaki, Akihiko
Tanimoto, Tetsuya
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
[9] Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination
Maciej Rosoł
Jakub S. Gąsior
Jonasz Łaba
Kacper Korzeniewski
Marcel Młyńczak
Scientific Reports, 13
[10] Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination
Rosol, Maciej
Gasior, Jakub S.
Laba, Jonasz
Korzeniewski, Kacper
Mlynczak, Marcel
SCIENTIFIC REPORTS, 2023, 13 (01)

← 1 2 3 4 5 →