The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists

被引:8
|
作者
Gunay, Serkan [1 ]
Ozturk, Ahmet [1 ]
Yigit, Yavuz [2 ]
机构
[1] Hitit Univ, Corum Erol Olcok Educ & Res Hosp, Dept Emergency Med, Emergency Med, Corum, Turkiye
[2] Hamad Gen Hosp, Dept Emergency Med, Emergency Med, Hamad Med Corp, Doha, Qatar
来源
关键词
Artificial intelligence; ChatGPT; GPT-4; Gemini; Electrocardiography; GPT-4o; ARTIFICIAL-INTELLIGENCE;
D O I
10.1016/j.ajem.2024.07.043
中图分类号
R4 [临床医学];
学科分类号
1002 ; 100602 ;
摘要
Introduction: GPT-4, GPT-4o and Gemini advanced, which are among the well-known large language models (LLMs), have the capability to recognize and interpret visual data. When the literature is examined, there are a very limited number of studies examining the ECG performance of GPT-4. However, there is no study in the literature examining the success of Gemini and GPT-4o in ECG evaluation. The aim of our study is to evaluate the performance of GPT-4, GPT-4o, and Gemini in ECG evaluation, assess their usability in the medical field, and compare their accuracy rates in ECG interpretation with those of cardiologists and emergency medicine specialists. Methods: The study was conducted from May 14, 2024, to June 3, 2024. The book "150 ECG Cases" served as a reference, containing two sections: daily routine ECGs and more challenging ECGs. For this study, two emergency medicine specialists selected 20 ECG cases from each section, totaling 40 cases. In the next stage, the questions were evaluated by emergency medicine specialists and cardiologists. In the subsequent phase, a diagnostic question was entered daily into GPT-4, GPT-4o, and Gemini Advanced on separate chat interfaces. In the final phase, the responses provided by cardiologists, emergency medicine specialists, GPT-4, GPT-4o, and Gemini Advanced were statistically evaluated across three categories: routine daily ECGs, more challenging ECGs, and the total number of ECGs. Results: Cardiologists outperformed GPT-4, GPT-4o, and Gemini Advanced in all three groups. Emergency medicine specialists performed better than GPT-4o in routine daily ECG questions and total ECG questions (p = 0.003 and p = 0.042, respectively). When comparing GPT-4o with Gemini Advanced and GPT-4, GPT-4o performed better in total ECG questions (p = 0.027 and p < 0.001, respectively). In routine daily ECG questions, GPT-4o also outperformed Gemini Advanced (p = 0.004). Weak agreement was observed in the responses given by GPT-4 (p < 0.001, Fleiss Kappa = 0.265) and Gemini Advanced (p < 0.001, Fleiss Kappa = 0.347), while moderate agreement was observed in the responses given by GPT-4o (p < 0.001, Fleiss Kappa = 0.514). Conclusion: While GPT-4o shows promise, especially in more challenging ECG questions, and may have potential as an assistant for ECG evaluation, its performance in routine and overall assessments still lags behind human specialists. The limited accuracy and consistency of GPT-4 and Gemini suggest that their current use in clinical ECG interpretation is risky.
引用
收藏
页码:68 / 73
页数:6
相关论文
共 50 条
  • [1] The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists
    Wang, Haihua
    Lan, Ji
    AMERICAN JOURNAL OF EMERGENCY MEDICINE, 2025, 87 : 197 - 197
  • [2] From GPT-4 to GPT-4o: Progress and Challenges in ECG Interpretation
    Pandya, Vidish
    Ge, Alan
    Ramineni, Shreya
    Danilov, Alexandrina
    Kirdar, Faisal
    Di Biase, Luigi
    Ferrick, Kevin
    Krumerman, Andrew
    CIRCULATION, 2024, 150
  • [3] Custom GPTs Enhancing Performance and Evidence Compared with GPT-3.5, GPT-4, and GPT-4o? A Study on the Emergency Medicine Specialist Examination
    Liu, Chiu-Liang
    Ho, Chien-Ta
    Wu, Tzu-Chi
    HEALTHCARE, 2024, 12 (17)
  • [4] An exploratory assessment of GPT-4o and GPT-4 performance on the Japanese National Dental Examination
    Morishita, Masaki
    Fukuda, Hikaru
    Yamaguchi, Shino
    Muraoka, Kosuke
    Nakamura, Taiji
    Hayashi, Masanari
    Yoshioka, Izumi
    Ono, Kentaro
    Awano, Shuji
    SAUDI DENTAL JOURNAL, 2024, 36 (12) : 1577 - 1581
  • [5] Assessing the accuracy and efficiency of Chat GPT-4 Omni (GPT-4o) in biomedical statistics Comparative study with traditional tools
    Meo, Anusha S.
    Shaikh, Narmeen
    Meo, Sultan A.
    SAUDI MEDICAL JOURNAL, 2024, 45 (12) : 1383 - 1390
  • [6] Capability of GPT-4o in cranial imaging interpretation for emergency medicine
    Beeler, Muhammed Said
    AMERICAN JOURNAL OF EMERGENCY MEDICINE, 2025, 87 : 186 - 186
  • [7] Evaluating the accuracy, time and cost of GPT-4 and GPT-4o in liver disease diagnoses using cases from "What is Your Diagnosis"
    Guo, Yusheng
    Li, Tianxiang
    Xie, Jiao
    Luo, Miao
    Zheng, Chuansheng
    JOURNAL OF HEPATOLOGY, 2025, 82 (01) : e15 - e17
  • [8] Evaluating the Visual Accuracy of Gemini Pro 1.5 and GPT-4o in Identifying Endoscopic Anatomical Landmarks
    Kerbage, Anthony
    Souaid, Tarek
    Macaron, Carole
    Burke, Carol A.
    Rouphael, Carol
    AMERICAN JOURNAL OF GASTROENTEROLOGY, 2024, 119 (10S):
  • [9] ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis
    Hoppe, John Michael
    Auer, Matthias K.
    Strueven, Anna
    Massberg, Steffen
    Stremmel, Christopher
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [10] GPT-4 in Nuclear Medicine Education: Does It Outperform GPT-3.5?
    Currie, Geoffrey M.
    JOURNAL OF NUCLEAR MEDICINE TECHNOLOGY, 2023, 51 (04) : 314 - 317