The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists

被引:8
|
作者
Gunay, Serkan [1 ]
Ozturk, Ahmet [1 ]
Yigit, Yavuz [2 ]
机构
[1] Hitit Univ, Corum Erol Olcok Educ & Res Hosp, Dept Emergency Med, Emergency Med, Corum, Turkiye
[2] Hamad Gen Hosp, Dept Emergency Med, Emergency Med, Hamad Med Corp, Doha, Qatar
来源
关键词
Artificial intelligence; ChatGPT; GPT-4; Gemini; Electrocardiography; GPT-4o; ARTIFICIAL-INTELLIGENCE;
D O I
10.1016/j.ajem.2024.07.043
中图分类号
R4 [临床医学];
学科分类号
1002 ; 100602 ;
摘要
Introduction: GPT-4, GPT-4o and Gemini advanced, which are among the well-known large language models (LLMs), have the capability to recognize and interpret visual data. When the literature is examined, there are a very limited number of studies examining the ECG performance of GPT-4. However, there is no study in the literature examining the success of Gemini and GPT-4o in ECG evaluation. The aim of our study is to evaluate the performance of GPT-4, GPT-4o, and Gemini in ECG evaluation, assess their usability in the medical field, and compare their accuracy rates in ECG interpretation with those of cardiologists and emergency medicine specialists. Methods: The study was conducted from May 14, 2024, to June 3, 2024. The book "150 ECG Cases" served as a reference, containing two sections: daily routine ECGs and more challenging ECGs. For this study, two emergency medicine specialists selected 20 ECG cases from each section, totaling 40 cases. In the next stage, the questions were evaluated by emergency medicine specialists and cardiologists. In the subsequent phase, a diagnostic question was entered daily into GPT-4, GPT-4o, and Gemini Advanced on separate chat interfaces. In the final phase, the responses provided by cardiologists, emergency medicine specialists, GPT-4, GPT-4o, and Gemini Advanced were statistically evaluated across three categories: routine daily ECGs, more challenging ECGs, and the total number of ECGs. Results: Cardiologists outperformed GPT-4, GPT-4o, and Gemini Advanced in all three groups. Emergency medicine specialists performed better than GPT-4o in routine daily ECG questions and total ECG questions (p = 0.003 and p = 0.042, respectively). When comparing GPT-4o with Gemini Advanced and GPT-4, GPT-4o performed better in total ECG questions (p = 0.027 and p < 0.001, respectively). In routine daily ECG questions, GPT-4o also outperformed Gemini Advanced (p = 0.004). Weak agreement was observed in the responses given by GPT-4 (p < 0.001, Fleiss Kappa = 0.265) and Gemini Advanced (p < 0.001, Fleiss Kappa = 0.347), while moderate agreement was observed in the responses given by GPT-4o (p < 0.001, Fleiss Kappa = 0.514). Conclusion: While GPT-4o shows promise, especially in more challenging ECG questions, and may have potential as an assistant for ECG evaluation, its performance in routine and overall assessments still lags behind human specialists. The limited accuracy and consistency of GPT-4 and Gemini suggest that their current use in clinical ECG interpretation is risky.
引用
收藏
页码:68 / 73
页数:6
相关论文
共 50 条
  • [41] Assessing GPT-4 multimodal performance in radiological image analysis
    Brin, Dana
    Sorin, Vera
    Barash, Yiftach
    Konen, Eli
    Glicksberg, Benjamin S.
    Nadkarni, Girish N.
    Klang, Eyal
    EUROPEAN RADIOLOGY, 2025, 35 (04) : 1959 - 1965
  • [42] RadOnc-GPT (gpt-4o) versus human data extraction for prostate cancer clinical research.
    Namazi, Mohammad Javad
    Osorio, Mariana Borras
    Holmes, Jason M.
    Routman, David M.
    Ebner, Daniel
    Wang, Peilong
    Liu, Wei
    Waddle, Mark Raymond
    JOURNAL OF CLINICAL ONCOLOGY, 2025, 43 (5_SUPPL)
  • [43] The Implementation of Multimodal Large Language Models for Hydrological Applications: A Comparative Study of GPT-4 Vision, Gemini, LLaVa, and Multimodal-GPT
    Kadiyala, Likith Anoop
    Mermer, Omer
    Samuel, Dinesh Jackson
    Sermet, Yusuf
    Demir, Ibrahim
    HYDROLOGY, 2024, 11 (09)
  • [44] GPT-4o and multimodal large language models as companions for mental wellbeing
    Thapa, Surendrabikram
    Adhikari, Surabhi
    ASIAN JOURNAL OF PSYCHIATRY, 2024, 99
  • [45] GPT-4o is more like a real person: potentials in surgical oncology
    Sun, Zaijie
    Yang, Jia
    Zhang, Nan
    Wu, Haiyang
    Li, Cheng
    INTERNATIONAL JOURNAL OF SURGERY, 2025, 111 (01) : 1654 - 1655
  • [46] Development and Comparative Evaluation of a Reinstructed GPT-4o Model Specialized in Periodontology
    Fanelli, Francesco
    Saleh, Muhammad
    Santamaria, Pasquale
    Zhurakivska, Khrystyna
    Nibali, Luigi
    Troiano, Giuseppe
    JOURNAL OF CLINICAL PERIODONTOLOGY, 2024,
  • [47] Results of the COMPARE-GPT study: Comparison of medication package inserts and GPT-4 cancer drug information
    Hundal, Jasmin
    Teplinsky, Eleonora
    JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (16)
  • [48] A comparison of human and GPT-4 use of probabilistic phrases in a coordination game
    Maloney, Laurence T.
    Dal Martello, Maria F.
    Fei, Vivian
    Ma, Valerie
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [49] Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions
    Roy, Soumyadeep
    Khatua, Aparup
    Ghoochani, Fatemeh
    Hadler, Uwe
    Nejdl, Wolfgang
    Ganguly, Niloy
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1073 - 1082
  • [50] Potential of GPT-4 for Detecting Errors in Radiology Reports: Implications for Reporting Accuracy
    Gertz, Roman Johannes
    Dratsch, Thomas
    Bunck, Alexander Christian
    Lennartz, Simon
    Iuga, Andra-Iza
    Hellmich, Martin Gunnar
    Persigehl, Thorsten
    Pennig, Lenhard
    Gietzen, Carsten Herbert
    Fervers, Philipp
    Maintz, David
    Hahnfeldt, Robert
    Kottlors, Jonathan
    RADIOLOGY, 2024, 311 (01)