Performance of ChatGPT and GPT-4 on Polish National Specialty Exam (NSE) in Ophthalmology

被引：0

作者：

Ciekalski, Marcin ^{[1
]}

Laskowski, Maciej ^{[1
]}

Koperczak, Agnieszka ^{[1
]}

Smierciak, Maria ^{[1
]}

Sirek, Sebastian ^{[2
]}

机构：

[1] Med Univ Silesia, Fac Med Sci Katowice, Student Sci Soc, Dept Ophthalmol, Katowice, Poland

[2] Med Univ Siles, Fac Med Sci Katowice, Dept Ophthalmol, Katowice, Poland

来源：

POSTEPY HIGIENY I MEDYCYNY DOSWIADCZALNEJ | 2024年 / 78卷 / 01期

关键词：

ophthalmology; ChatGPT; Polish national specialty exam;

D O I：

10.2478/ahem-2024-0006

中图分类号：

R-3 [医学研究方法]; R3 [基础医学];

学科分类号：

1001 ;

摘要：

Introduction Artificial intelligence (AI) has evolved significantly, driven by advancements in computing power and big data. Technologies like machine learning and deep learning have led to sophisticated models such as GPT-3.5 and GPT-4. This study assesses the performance of these AI models on the Polish National Specialty Exam in ophthalmology, exploring their potential to support research, education, and clinical decision-making in healthcare.Materials and Methods The study analyzed 98 questions from the Spring 2023 Polish NSE in Ophthalmology. Questions were categorized into five groups: Physiology & Diagnostics, Clinical & Case Questions, Treatment & Pharmacology, Surgery, and Pediatrics. GPT-3.5 and GPT-4 were tested for their accuracy in answering these questions, with a confidence rating from 1 to 5 assigned to each response. Statistical analyses, including the Chi-squared test and Mann-Whitney U test, were employed to compare the models' performance.Results GPT-4 demonstrated a significant improvement over GPT-3.5, correctly answering 63.3% of questions compared to GPT-3.5's 37.8%. GPT-4's performance met the passing criteria for the NSE. The models showed varying degrees of accuracy across different categories, with a notable gap in fields like surgery and pediatrics.Conclusions The study highlights the potential of GPT models in aiding clinical decisions and educational purposes in ophthalmology. However, it also underscores the models' limitations, particularly in specialized fields like surgery and pediatrics. The findings suggest that while AI models like GPT-3.5 and GPT-4 can significantly assist in the medical field, they require further development and fine-tuning to address specific challenges in various medical domains.

引用

页码：111 / 116

页数：6

共 50 条

[21] Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training
Alfredo Madrid-García
Zulema Rosales-Rosado
Dalifer Freites-Nuñez
Inés Pérez-Sancristóbal
Esperanza Pato-Cour
Chamaida Plasencia-Rodríguez
Luis Cabeza-Osorio
Lydia Abasolo-Alcázar
Leticia León-Mateos
Benjamín Fernández-Gutiérrez
Luis Rodríguez-Rodríguez
Scientific Reports, 13 (1)
[22] Will ChatGPT/GPT-4 be a Lighthouse to Guide Spinal Surgeons?
Yongbin He
Haifeng Tang
Dongxue Wang
Shuqin Gu
Guoxin Ni
Haiyang Wu
Annals of Biomedical Engineering, 2023, 51 : 1362 - 1365
[23] Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination
Lin, John C. C.
Younessi, David N. N.
Kurapati, Sai S. S.
Tang, Oliver Y. Y.
Scott, Ingrid U. U.
EYE, 2023, 37 (17) : 3694 - 3695
[24] Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination
John C. Lin
David N. Younessi
Sai S. Kurapati
Oliver Y. Tang
Ingrid U. Scott
Eye, 2023, 37 : 3694 - 3695
[25] Will ChatGPT/GPT-4 be a Lighthouse to Guide Spinal Surgeons?
He, Yongbin
Tang, Haifeng
Wang, Dongxue
Gu, Shuqin
Ni, Guoxin
Wu, Haiyang
ANNALS OF BIOMEDICAL ENGINEERING, 2023, 51 (07) : 1362 - 1365
[26] An exploratory assessment of GPT-4o and GPT-4 performance on the Japanese National Dental Examination
Morishita, Masaki
Fukuda, Hikaru
Yamaguchi, Shino
Muraoka, Kosuke
Nakamura, Taiji
Hayashi, Masanari
Yoshioka, Izumi
Ono, Kentaro
Awano, Shuji
SAUDI DENTAL JOURNAL, 2024, 36 (12) : 1577 - 1581
[27] ChatGPT, GPT-4, and Bard and official board examination: comment
Daungsupawong, Hinpetch
Wiwanitkit, Viroj
JAPANESE JOURNAL OF RADIOLOGY, 2024, 42 (02) : 212 - 213
[28] ChatGPT, GPT-4, and Bard and official board examination: comment
Hinpetch Daungsupawong
Viroj Wiwanitkit
Japanese Journal of Radiology, 2024, 42 : 212 - 213
[29] Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4
Chang, Kent K.
Cramer, Mackenzie
Soni, Sandeep
Bamman, David
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7312 - 7327
[30] ChatGPT/GPT-4: enabling a new era of surgical oncology
Cheng, Kunming
Wu, Haiyang
Li, Cheng
INTERNATIONAL JOURNAL OF SURGERY, 2023, 109 (08) : 2549 - 2550

← 1 2 3 4 5 →