The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis

被引：0

作者：

Bongco, Edgar Dominic A. ^{[1
,2
]}

Cua, Sean Kendrich N. ^{[1
,2
]}

Hernandez, Mary Angeline Luz U. ^{[1
,2
]}

Pascual, Juan Silvestre G. ^{[1
,2
]}

Khu, Kathleen Joy O. ^{[1
,2
]}

机构：

[1] Univ Philippines Manila, Coll Med, Dept Neurosci, Div Neurosurg, Manila, Philippines

[2] Univ Philippines Manila, Philippine Gen Hosp, Manila, Philippines

来源：

NEUROSURGICAL REVIEW | 2024年 / 47卷 / 01期

关键词：

ChatGPT; Large language model; Neurosurgery education; Neurosurgery board examination;

D O I：

10.1007/s10143-024-03144-y

中图分类号：

R74 [神经病学与精神病学];

学科分类号：

摘要：

Objective Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents. Methods A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05. Results After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I2 = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT. Conclusion Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.

引用

页数：8

共 50 条

[1] Performance of ChatGPT versus Google Bard on Answering Postgraduate-Level Surgical Examination Questions: A Meta-Analysis
Andrew, Albert
Zhao, Sunny
INDIAN JOURNAL OF SURGERY, 2025,
[2] Performance of ChatGPT in medical examinations: A systematic review and a meta-analysis
Levin, Gabriel
Horesh, Nir
Brezinov, Yoav
Meyer, Raanan
BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2024, 131 (03) : 378 - 380
[3] Burnout Among Neurosurgeons and Residents in Neurosurgery: A Systematic Review and Meta-Analysis of the Literature
Zaed, Ismail
Jaaiddane, Youssef
Chibbaro, Salvatore
Tinterri, Benedetta
WORLD NEUROSURGERY, 2020, 143 : E529 - E534
[4] ChatGPT versus the neurosurgical written boards: a comparative analysis of artificial intelligence/machine learning performance on neurosurgical board-style questions
Hopkins, Benjamin S.
Nguyen, Vincent N.
Dallas, Jonathan
Texakalidis, Pavlos
Yang, Max
Renn, Alex
Guerra, Gage
Kashif, Zain
Cheok, Stephanie
Zada, Gabriel
Mack, William J.
JOURNAL OF NEUROSURGERY, 2023, 139 (03) : 904 - 911
[5] Delirium in neurosurgery: a systematic review and meta-analysis
Kappen, P. R.
Kakar, E.
Dirven, C. M. F.
van der Jagt, M.
Klimek, M.
Osse, R. J.
Vincent, A. P. J. E.
NEUROSURGICAL REVIEW, 2022, 45 (01) : 329 - 341
[6] Delirium in neurosurgery: a systematic review and meta-analysis
P. R. Kappen
E. Kakar
C. M. F. Dirven
M. van der Jagt
M. Klimek
R. J. Osse
A. P. J. E. Vincent
Neurosurgical Review, 2022, 45 : 329 - 341
[7] GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions
Guerra, Gage A.
Hofmann, Hayden
Sobhani, Sina
Hofmann, Grady
Gomez, David
Soroudi, Daniel
Hopkins, Benjamin S.
Dallas, Jonathan
Pangal, Dhiraj J.
Cheok, Stephanie
Nguyen, Vincent N.
Mack, William J.
Zada, Gabriel
WORLD NEUROSURGERY, 2023, 179 : E160 - E165
[8] Neurosurgical Malpractice Litigation: A Systematic Review and Meta-Analysis
Iqbal, Javed
Shafique, Muhammad Ashir
Mustafa, Muhammad Saqlain
Covell, Michael M.
Fatima, Afia
Saboor, Hafiz Abdus
Nadeem, Abdullah
Iqbal, Ather
Iqbal, Muhammad Faheem
Rangwala, Burhanuddin Sohail
Hafeez, Muhammad Hassan
Bowers, Christian A.
WORLD NEUROSURGERY, 2024, 188 : 55 - 67
[9] Letter to the Editor Regarding "Burnout Among Neurosurgeons and Residents in Neurosurgery: A Systematic Review and Meta-Analysis of the Literature"
Tao, Yichi
Xu, Haicheng
Huang, Xin
WORLD NEUROSURGERY, 2022, 164 : 478 - 478
[10] Letter to the Editor Regarding "Burnout Among Neurosurgeons and Residents in Neurosurgery: A Systematic Review and Meta-Analysis of the Literature"
Tao, Yichi
Xu, Haicheng
Huang, Xin
WORLD NEUROSURGERY, 2022, 164 : 478 - 478

← 1 2 3 4 5 →