Performance of ChatGPT and GPT-4 on Polish National Specialty Exam (NSE) in Ophthalmology

被引:0
|
作者
Ciekalski, Marcin [1 ]
Laskowski, Maciej [1 ]
Koperczak, Agnieszka [1 ]
Smierciak, Maria [1 ]
Sirek, Sebastian [2 ]
机构
[1] Med Univ Silesia, Fac Med Sci Katowice, Student Sci Soc, Dept Ophthalmol, Katowice, Poland
[2] Med Univ Siles, Fac Med Sci Katowice, Dept Ophthalmol, Katowice, Poland
来源
关键词
ophthalmology; ChatGPT; Polish national specialty exam;
D O I
10.2478/ahem-2024-0006
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
Introduction Artificial intelligence (AI) has evolved significantly, driven by advancements in computing power and big data. Technologies like machine learning and deep learning have led to sophisticated models such as GPT-3.5 and GPT-4. This study assesses the performance of these AI models on the Polish National Specialty Exam in ophthalmology, exploring their potential to support research, education, and clinical decision-making in healthcare.Materials and Methods The study analyzed 98 questions from the Spring 2023 Polish NSE in Ophthalmology. Questions were categorized into five groups: Physiology & Diagnostics, Clinical & Case Questions, Treatment & Pharmacology, Surgery, and Pediatrics. GPT-3.5 and GPT-4 were tested for their accuracy in answering these questions, with a confidence rating from 1 to 5 assigned to each response. Statistical analyses, including the Chi-squared test and Mann-Whitney U test, were employed to compare the models' performance.Results GPT-4 demonstrated a significant improvement over GPT-3.5, correctly answering 63.3% of questions compared to GPT-3.5's 37.8%. GPT-4's performance met the passing criteria for the NSE. The models showed varying degrees of accuracy across different categories, with a notable gap in fields like surgery and pediatrics.Conclusions The study highlights the potential of GPT models in aiding clinical decisions and educational purposes in ophthalmology. However, it also underscores the models' limitations, particularly in specialized fields like surgery and pediatrics. The findings suggest that while AI models like GPT-3.5 and GPT-4 can significantly assist in the medical field, they require further development and fine-tuning to address specific challenges in various medical domains.
引用
收藏
页码:111 / 116
页数:6
相关论文
共 50 条
  • [31] The Personification of ChatGPT (GPT-4)-Understanding Its Personality and Adaptability
    Stockli, Leandro
    Joho, Luca
    Lehner, Felix
    Hanne, Thomas
    INFORMATION, 2024, 15 (06)
  • [32] ChatGPT and Patient Information in Nuclear Medicine: GPT-3.5 Versus GPT-4
    Currie, Geoff
    Robbie, Stephanie
    Tually, Peter
    JOURNAL OF NUCLEAR MEDICINE TECHNOLOGY, 2023, 51 (04) : 307 - 313
  • [33] Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
    Farhat, Faiza
    Chaudhry, Beenish Moalla
    Nadeem, Mohammad
    Sohail, Shahab Saquib
    Madsen, Dag Oivind
    JMIR MEDICAL EDUCATION, 2024, 10
  • [34] GPT-4 Performance for Neurologic Localization
    Lee, Jung-Hyun
    Choi, Eunhee
    McDougal, Robert
    Lytton, William W.
    NEUROLOGY-CLINICAL PRACTICE, 2024, 14 (03)
  • [35] Performance of Chatgpt in ophthalmology exam; human versus AI
    Balci, Ali Safa
    Yazar, Zeliha
    Ozturk, Banu Turgut
    Altan, Cigdem
    INTERNATIONAL OPHTHALMOLOGY, 2024, 44 (01)
  • [36] Correspondence to " Will ChatGPT pass the Polish specialty exam in radiology and diagnostic imaging?"
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    POLISH JOURNAL OF RADIOLOGY, 2023, 88 : E552 - E552
  • [37] ChatGPT performance in the medical specialty exam: An observational study
    Oztermeli, Ayse Dilara
    Oztermeli, Ahmet
    MEDICINE, 2023, 102 (32) : E34673
  • [38] Performance of large language models in the National Dental Licensing Examination in China: a comparative analysis of ChatGPT, GPT-4, and New Bing
    Hu, Ziyang
    Xu, Zhe
    Shi, Ping
    Zhang, Dandan
    Yue, Qu
    Zhang, Jiexia
    Lei, Xin
    Lin, Zitong
    INTERNATIONAL JOURNAL OF COMPUTERIZED DENTISTRY, 2024, 27 (04)
  • [39] Performance evaluation of ChatGPT, GPT-4, and Bard on the official board examination of the Japan Radiology Society
    Toyama, Yoshitaka
    Harigai, Ayaka
    Abe, Mirei
    Nagano, Mitsutoshi
    Kawabata, Masahiro
    Seki, Yasuhiro
    Takase, Kei
    JAPANESE JOURNAL OF RADIOLOGY, 2023, 42 (2) : 201 - 207
  • [40] Performance evaluation of ChatGPT, GPT-4, and Bard on the official board examination of the Japan Radiology Society
    Yoshitaka Toyama
    Ayaka Harigai
    Mirei Abe
    Mitsutoshi Nagano
    Masahiro Kawabata
    Yasuhiro Seki
    Kei Takase
    Japanese Journal of Radiology, 2024, 42 : 201 - 207