The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists

被引：8

作者：

Gunay, Serkan ^{[1
]}

Ozturk, Ahmet ^{[1
]}

Yigit, Yavuz ^{[2
]}

机构：

[1] Hitit Univ, Corum Erol Olcok Educ & Res Hosp, Dept Emergency Med, Emergency Med, Corum, Turkiye

[2] Hamad Gen Hosp, Dept Emergency Med, Emergency Med, Hamad Med Corp, Doha, Qatar

来源：

AMERICAN JOURNAL OF EMERGENCY MEDICINE | 2024年 / 84卷

关键词：

Artificial intelligence; ChatGPT; GPT-4; Gemini; Electrocardiography; GPT-4o; ARTIFICIAL-INTELLIGENCE;

D O I：

10.1016/j.ajem.2024.07.043

中图分类号：

R4 [临床医学];

学科分类号：

1002 ; 100602 ;

摘要：

Introduction: GPT-4, GPT-4o and Gemini advanced, which are among the well-known large language models (LLMs), have the capability to recognize and interpret visual data. When the literature is examined, there are a very limited number of studies examining the ECG performance of GPT-4. However, there is no study in the literature examining the success of Gemini and GPT-4o in ECG evaluation. The aim of our study is to evaluate the performance of GPT-4, GPT-4o, and Gemini in ECG evaluation, assess their usability in the medical field, and compare their accuracy rates in ECG interpretation with those of cardiologists and emergency medicine specialists. Methods: The study was conducted from May 14, 2024, to June 3, 2024. The book "150 ECG Cases" served as a reference, containing two sections: daily routine ECGs and more challenging ECGs. For this study, two emergency medicine specialists selected 20 ECG cases from each section, totaling 40 cases. In the next stage, the questions were evaluated by emergency medicine specialists and cardiologists. In the subsequent phase, a diagnostic question was entered daily into GPT-4, GPT-4o, and Gemini Advanced on separate chat interfaces. In the final phase, the responses provided by cardiologists, emergency medicine specialists, GPT-4, GPT-4o, and Gemini Advanced were statistically evaluated across three categories: routine daily ECGs, more challenging ECGs, and the total number of ECGs. Results: Cardiologists outperformed GPT-4, GPT-4o, and Gemini Advanced in all three groups. Emergency medicine specialists performed better than GPT-4o in routine daily ECG questions and total ECG questions (p = 0.003 and p = 0.042, respectively). When comparing GPT-4o with Gemini Advanced and GPT-4, GPT-4o performed better in total ECG questions (p = 0.027 and p < 0.001, respectively). In routine daily ECG questions, GPT-4o also outperformed Gemini Advanced (p = 0.004). Weak agreement was observed in the responses given by GPT-4 (p < 0.001, Fleiss Kappa = 0.265) and Gemini Advanced (p < 0.001, Fleiss Kappa = 0.347), while moderate agreement was observed in the responses given by GPT-4o (p < 0.001, Fleiss Kappa = 0.514). Conclusion: While GPT-4o shows promise, especially in more challenging ECG questions, and may have potential as an assistant for ECG evaluation, its performance in routine and overall assessments still lags behind human specialists. The limited accuracy and consistency of GPT-4 and Gemini suggest that their current use in clinical ECG interpretation is risky.

引用

页码：68 / 73

页数：6

共 50 条

[31] A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis
Zhang, Junxiu
Ma, Yao
Zhang, Rong
Chen, Yanhua
Xu, Mengyao
Su, Rina
Ma, Ke
SCIENTIFIC REPORTS, 2024, 14 (01):
[32] Accuracy of GPT-4 in histopathological image detection and classification of colorectal adenomas
Laohawetwanit, Thiyaphat
Namboonlue, Chutimon
Apornvirat, Sompon
JOURNAL OF CLINICAL PATHOLOGY, 2024,
[33] A comparison of human, GPT-3.5, and GPT-4 performance in a university-level coding course
Yeadon, Will
Peach, Alex
Testrow, Craig
SCIENTIFIC REPORTS, 2024, 14 (01):
[34] Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases
Rutledge, Geoffrey W.
LEARNING HEALTH SYSTEMS, 2024, 8 (03):
[35] ChatGPT as a Source of Information for Bariatric Surgery Patients: a Comparative Analysis of Accuracy and Comprehensiveness Between GPT-4 and GPT-3.5
Samaan, Jamil S.
Rajeev, Nithya
Ng, Wee Han
Srinivasan, Nitin
Busam, Jonathan A.
Yeo, Yee Hui
Samakar, Kamran
OBESITY SURGERY, 2024, 34 (05) : 1987 - 1989
[36] Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination
Lin, John C. C.
Younessi, David N. N.
Kurapati, Sai S. S.
Tang, Oliver Y. Y.
Scott, Ingrid U. U.
EYE, 2023, 37 (17) : 3694 - 3695
[37] Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination
John C. Lin
David N. Younessi
Sai S. Kurapati
Oliver Y. Tang
Ingrid U. Scott
Eye, 2023, 37 : 3694 - 3695
[38] The Performance of GPT-3.5, GPT-4, and Bard on the Japanese National Dentist Examination: A Comparison Study
Ohta, Keiichi
Ohta, Satomi
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (12)
[39] Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4
Lahat, Adi
Sharif, Kassem
Zoabi, Narmin
Patt, Yonatan Shneor
Sharif, Yousra
Fisher, Lior
Shani, Uria
Arow, Mohamad
Levin, Roni
Klang, Eyal
JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[40] Assessing GPT-4 multimodal performance in radiological image analysis
Brin, Dana
Sorin, Vera
Barash, Yiftach
Konen, Eli
Glicksberg, Benjamin S.
Nadkarni, Girish N.
Klang, Eyal
EUROPEAN RADIOLOGY, 2025, 35 (04) : 1959 - 1965

← 1 2 3 4 5 →