The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists

被引:8
|
作者
Gunay, Serkan [1 ]
Ozturk, Ahmet [1 ]
Yigit, Yavuz [2 ]
机构
[1] Hitit Univ, Corum Erol Olcok Educ & Res Hosp, Dept Emergency Med, Emergency Med, Corum, Turkiye
[2] Hamad Gen Hosp, Dept Emergency Med, Emergency Med, Hamad Med Corp, Doha, Qatar
来源
关键词
Artificial intelligence; ChatGPT; GPT-4; Gemini; Electrocardiography; GPT-4o; ARTIFICIAL-INTELLIGENCE;
D O I
10.1016/j.ajem.2024.07.043
中图分类号
R4 [临床医学];
学科分类号
1002 ; 100602 ;
摘要
Introduction: GPT-4, GPT-4o and Gemini advanced, which are among the well-known large language models (LLMs), have the capability to recognize and interpret visual data. When the literature is examined, there are a very limited number of studies examining the ECG performance of GPT-4. However, there is no study in the literature examining the success of Gemini and GPT-4o in ECG evaluation. The aim of our study is to evaluate the performance of GPT-4, GPT-4o, and Gemini in ECG evaluation, assess their usability in the medical field, and compare their accuracy rates in ECG interpretation with those of cardiologists and emergency medicine specialists. Methods: The study was conducted from May 14, 2024, to June 3, 2024. The book "150 ECG Cases" served as a reference, containing two sections: daily routine ECGs and more challenging ECGs. For this study, two emergency medicine specialists selected 20 ECG cases from each section, totaling 40 cases. In the next stage, the questions were evaluated by emergency medicine specialists and cardiologists. In the subsequent phase, a diagnostic question was entered daily into GPT-4, GPT-4o, and Gemini Advanced on separate chat interfaces. In the final phase, the responses provided by cardiologists, emergency medicine specialists, GPT-4, GPT-4o, and Gemini Advanced were statistically evaluated across three categories: routine daily ECGs, more challenging ECGs, and the total number of ECGs. Results: Cardiologists outperformed GPT-4, GPT-4o, and Gemini Advanced in all three groups. Emergency medicine specialists performed better than GPT-4o in routine daily ECG questions and total ECG questions (p = 0.003 and p = 0.042, respectively). When comparing GPT-4o with Gemini Advanced and GPT-4, GPT-4o performed better in total ECG questions (p = 0.027 and p < 0.001, respectively). In routine daily ECG questions, GPT-4o also outperformed Gemini Advanced (p = 0.004). Weak agreement was observed in the responses given by GPT-4 (p < 0.001, Fleiss Kappa = 0.265) and Gemini Advanced (p < 0.001, Fleiss Kappa = 0.347), while moderate agreement was observed in the responses given by GPT-4o (p < 0.001, Fleiss Kappa = 0.514). Conclusion: While GPT-4o shows promise, especially in more challenging ECG questions, and may have potential as an assistant for ECG evaluation, its performance in routine and overall assessments still lags behind human specialists. The limited accuracy and consistency of GPT-4 and Gemini suggest that their current use in clinical ECG interpretation is risky.
引用
收藏
页码:68 / 73
页数:6
相关论文
共 50 条
  • [21] Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine
    Lee, Peter
    Bubeck, Sebastien
    Petro, Joseph
    NEW ENGLAND JOURNAL OF MEDICINE, 2023, 388 (13): : 1233 - 1239
  • [22] Assessing the accuracy and clinical utility of GPT-4O in abnormal blood cell morphology recognition
    Cai, Xinjian
    Zhan, Lili
    Lin, Yiteng
    DIGITAL HEALTH, 2024, 10
  • [23] Evaluating the Clinical Reasoning of GPT-4, Grok, and Gemini in Different Fields of Cardiology
    Reyes-Rivera, Jonathan
    Molina, Alberto Castro
    Romero-Lorenzo, Marco
    Ali, Sajid
    Gibson, Charles
    Saucedo, Jorge
    Calandrelli, Matias
    Cruz, Edgar Garcia
    Bahit, Cecilia
    Chi, Gerald
    Angulo, Stephanie
    Moore, Michelle
    Lopez-Quijano, Juan M.
    Samman, Abdallah
    Gordillo-Moscoso, Antonio A.
    Ali, Asif
    CIRCULATION, 2024, 150
  • [24] Is GPT-4 capable of passing MIR 2023? Comparison between GPT-4 and ChatGPT-3 in the MIR 2022 and 2023 exams
    Cerame, Alvaro
    Juaneda, Juan
    Estrella-Porter, Pablo
    de la Puente, Lucia
    Navarro, Joaquin
    Garcia, Eva
    Sanchez, Domingo A.
    Carrasco, Juan Pablo
    SPANISH JOURNAL OF MEDICAL EDUCATION, 2024, 5 (02):
  • [25] Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study
    Takagi, Soshi
    Watari, Takashi
    Erabi, Ayano
    Sakaguchi, Kota
    JMIR MEDICAL EDUCATION, 2023, 9
  • [26] Comparing Vision-Capable Models, GPT-4 and Gemini, With GPT-3.5 on Taiwan's Pulmonologist Exam
    Chen, Chih-Hsiung
    Hsieh, Kuang-Yu
    Huang, Kuo-En
    Lai, Hsien-Yun
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (08)
  • [27] Comparative analysis of GPT-4, Gemini, and Ernie as gloss sign language translators in special education
    Achraf Othman
    Khansa Chemnad
    Ahmed Tlili
    Ting Da
    Huanhuan Wang
    Ronghuai Huang
    Discover Global Society, 2 (1):
  • [28] Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases
    Sonoda, Yuki
    Kurokawa, Ryo
    Nakamura, Yuta
    Kanzawa, Jun
    Kurokawa, Mariko
    Ohizumi, Yuji
    Gonoi, Wataru
    Abe, Osamu
    JAPANESE JOURNAL OF RADIOLOGY, 2024, 42 (11) : 1231 - 1235
  • [29] Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine
    Jin, Qiao
    Chen, Fangyuan
    Zhou, Yiliang
    Xu, Ziyang
    Cheung, Justin M.
    Chen, Robert
    Summers, Ronald M.
    Rousseau, Justin F.
    Ni, Peiyun
    Landsman, Marc J.
    Baxter, Sally L.
    Al'Aref, Subhi J.
    Li, Yijia
    Chen, Alexander
    Brejt, Josef A.
    Chiang, Michael F.
    Peng, Yifan
    Lu, Zhiyong
    NPJ DIGITAL MEDICINE, 2024, 7 (01):
  • [30] ChatGPT as a Source of Information for Bariatric Surgery Patients: a Comparative Analysis of Accuracy and Comprehensiveness Between GPT-4 and GPT-3.5
    Jamil S. Samaan
    Nithya Rajeev
    Wee Han Ng
    Nitin Srinivasan
    Jonathan A. Busam
    Yee Hui Yeo
    Kamran Samakar
    Obesity Surgery, 2024, 34 : 1987 - 1989