Performance of ChatGPT and GPT-4 on Polish National Specialty Exam (NSE) in Ophthalmology

被引：0

作者：

Ciekalski, Marcin ^{[1
]}

Laskowski, Maciej ^{[1
]}

Koperczak, Agnieszka ^{[1
]}

Smierciak, Maria ^{[1
]}

Sirek, Sebastian ^{[2
]}

机构：

[1] Med Univ Silesia, Fac Med Sci Katowice, Student Sci Soc, Dept Ophthalmol, Katowice, Poland

[2] Med Univ Siles, Fac Med Sci Katowice, Dept Ophthalmol, Katowice, Poland

来源：

POSTEPY HIGIENY I MEDYCYNY DOSWIADCZALNEJ | 2024年 / 78卷 / 01期

关键词：

ophthalmology; ChatGPT; Polish national specialty exam;

D O I：

10.2478/ahem-2024-0006

中图分类号：

R-3 [医学研究方法]; R3 [基础医学];

学科分类号：

1001 ;

摘要：

Introduction Artificial intelligence (AI) has evolved significantly, driven by advancements in computing power and big data. Technologies like machine learning and deep learning have led to sophisticated models such as GPT-3.5 and GPT-4. This study assesses the performance of these AI models on the Polish National Specialty Exam in ophthalmology, exploring their potential to support research, education, and clinical decision-making in healthcare.Materials and Methods The study analyzed 98 questions from the Spring 2023 Polish NSE in Ophthalmology. Questions were categorized into five groups: Physiology & Diagnostics, Clinical & Case Questions, Treatment & Pharmacology, Surgery, and Pediatrics. GPT-3.5 and GPT-4 were tested for their accuracy in answering these questions, with a confidence rating from 1 to 5 assigned to each response. Statistical analyses, including the Chi-squared test and Mann-Whitney U test, were employed to compare the models' performance.Results GPT-4 demonstrated a significant improvement over GPT-3.5, correctly answering 63.3% of questions compared to GPT-3.5's 37.8%. GPT-4's performance met the passing criteria for the NSE. The models showed varying degrees of accuracy across different categories, with a notable gap in fields like surgery and pediatrics.Conclusions The study highlights the potential of GPT models in aiding clinical decisions and educational purposes in ophthalmology. However, it also underscores the models' limitations, particularly in specialized fields like surgery and pediatrics. The findings suggest that while AI models like GPT-3.5 and GPT-4 can significantly assist in the medical field, they require further development and fine-tuning to address specific challenges in various medical domains.

引用

页码：111 / 116

页数：6

共 50 条

[1] GPT-4 passes the bar exam
Katz, Daniel Martin
Bommarito, Michael James
Gao, Shang
Arredondo, Pablo
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2024, 382 (2270):
[2] Comment on: ‘Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination’ and ‘ChatGPT in ophthalmology: the dawn of a new era?’
Nima Ghadiri
Eye, 2024, 38 : 654 - 655
[3] Performance of GPT-4 Vision on kidney pathology exam questions
Miao, Jing
Thongprayoon, Charat
Cheungpasitporn, Wisit
Cornell, Lynn D.
AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2024, 162 (03) : 220 - 226
[4] Performance of GPT-4 Vision on kidney pathology exam questions
Daungsupawong, Hinpetch
Wiwanitkit, Viroj
AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2024,
[5] GPT-4 and Ophthalmology Operative Notes
Waisberg, Ethan
Ong, Joshua
Masalkhi, Mouayad
Kamran, Sharif Amit
Zaman, Nasif
Sarker, Prithul
Lee, Andrew G.
Tavakkoli, Alireza
ANNALS OF BIOMEDICAL ENGINEERING, 2023, 51 (11) : 2353 - 2355
[6] GPT-4 and Ophthalmology Operative Notes
Ethan Waisberg
Joshua Ong
Mouayad Masalkhi
Sharif Amit Kamran
Nasif Zaman
Prithul Sarker
Andrew G. Lee
Alireza Tavakkoli
Annals of Biomedical Engineering, 2023, 51 : 2353 - 2355
[7] Comment on: 'Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination' and 'ChatGPT in ophthalmology: the dawn of a new era?'
Ghadiri, Nima
EYE, 2024, 38 (04) : 654 - 655
[8] ChatGPT/GPT-4 and Spinal Surgeons
Amnuay Kleebayoon
Viroj Wiwanitkit
Annals of Biomedical Engineering, 2023, 51 : 1657 - 1657
[9] ChatGPT/GPT-4 and Spinal Surgeons
Kleebayoon, Amnuay
Wiwanitkit, Viroj
ANNALS OF BIOMEDICAL ENGINEERING, 2023, 51 (08) : 1657 - 1657
[10] Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations
Ali, Rohaid
Tang, Oliver Y.
Connolly, Ian D.
Sullivan, Patricia L. Zadnik
Shin, John H.
Fridley, Jared S.
Asaad, Wael F.
Cielo, Deus
Oyelese, Adetokunbo A.
Doberstein, Curtis E.
Gokaslan, Ziya L.
Telfeian, Albert E.
NEUROSURGERY, 2023, 93 (06) : 1353 - 1365

← 1 2 3 4 5 →