The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis

被引:0
|
作者
Bongco, Edgar Dominic A. [1 ,2 ]
Cua, Sean Kendrich N. [1 ,2 ]
Hernandez, Mary Angeline Luz U. [1 ,2 ]
Pascual, Juan Silvestre G. [1 ,2 ]
Khu, Kathleen Joy O. [1 ,2 ]
机构
[1] Univ Philippines Manila, Coll Med, Dept Neurosci, Div Neurosurg, Manila, Philippines
[2] Univ Philippines Manila, Philippine Gen Hosp, Manila, Philippines
关键词
ChatGPT; Large language model; Neurosurgery education; Neurosurgery board examination;
D O I
10.1007/s10143-024-03144-y
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
Objective Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents. Methods A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05. Results After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I2 = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT. Conclusion Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] In Reply to the Letter to the Editor Regarding "Burnout Among Neurosurgeons and Residents in Neurosurgery: A Systematic Review and Meta-Analysis of the Literature"
    Zaed, Ismail
    Chibbaro, Salvatore
    WORLD NEUROSURGERY, 2022, 164 : 479 - 479
  • [22] Performance of ChatGPT in medical licensing examinations in countries worldwide: A systematic review and meta-analysis protocol
    Liu, Mingxin
    Okuhara, Tsuyoshi
    Chang, Xinyi
    Okada, Hiroko
    Kiuchi, Takahiro
    PLOS ONE, 2024, 19 (10):
  • [23] Magnetic resonance defecography versus clinical examination and fluoroscopy: a systematic review and meta-analysis
    Ramage, L.
    Simillis, C.
    Yen, C.
    Lutterodt, C.
    Qiu, S.
    Tan, E.
    Kontovounisios, C.
    Tekkis, P.
    TECHNIQUES IN COLOPROCTOLOGY, 2017, 21 (12) : 915 - 927
  • [24] Magnetic resonance defecography versus clinical examination and fluoroscopy: a systematic review and meta-analysis
    L. Ramage
    C. Simillis
    C. Yen
    C. Lutterodt
    S. Qiu
    E. Tan
    C. Kontovounisios
    P. Tekkis
    Techniques in Coloproctology, 2017, 21 : 915 - 927
  • [25] Performance evaluation of ChatGPT-4.0 and Gemini on image-based neurosurgery board practice questions: A comparative analysis
    Mcnulty, Alana M.
    Valluri, Harshitha
    Gajjar, Avi A.
    Custozzo, Amanda
    Field, Nicholas C.
    Paul, Alexandra R.
    JOURNAL OF CLINICAL NEUROSCIENCE, 2025, 134
  • [26] Systematic review and meta-analysis: a critical examination of the methodology
    Martsevich, S. Yu
    Navasardyan, A. R.
    Lobastov, K., V
    Mikaelyan, M., V
    Mikhaylenko, E., V
    Suvorov, A. Yu
    Schastlivtsev, I., V
    Dzhioeva, O. N.
    Matveev, V. V.
    Akimova, E. S.
    Sytkov, V. V.
    Dubar, E.
    Drapkina, O. M.
    RATIONAL PHARMACOTHERAPY IN CARDIOLOGY, 2023, 19 (04) : 382 - 397
  • [27] Robot-assisted neurosurgery versus conventional treatment for intracerebral hemorrhage: A systematic review and meta-analysis
    Xiong, Ruochu
    Li, Fangye
    Chen, Xiaolei
    JOURNAL OF CLINICAL NEUROSCIENCE, 2020, 82 : 252 - 259
  • [28] The Impact of Resident Participation on Neurosurgical Outcomes: A Systematic Review & Meta-Analysis
    Baisiwala, Shivani
    Shlobin, Nathan A.
    Cloney, Michael
    Dahdaleh, Nader S.
    NEUROSURGERY, 2020, 67 : 42 - 43
  • [29] Diagnostic Accuracy of ChatGPT for Patients' Triage; a Systematic Review and Meta-Analysis
    Kaboudi, Navid
    Firouzbakht, Saeedeh
    Eftekhar, Mohammad Shahir
    Fayazbakhsh, Fatemeh
    Joharivarnoosfaderani, Niloufar
    Ghaderi, Salar
    Dehdashti, Mohammadreza
    Kia, Yasmin Mohtasham
    Afshari, Maryam
    Vasaghi-Gharamaleki, Maryam
    Haghani, Leila
    Moradzadeh, Zahra
    Khalaj, Fattaneh
    Mohammadi, Zahra
    Hasanabadi, Zahra
    Shahidi, Ramin
    ARCHIVES OF ACADEMIC EMERGENCY MEDICINE, 2024, 12 (01)
  • [30] Functional lesional neurosurgery for tremor - a protocol for a systematic review and meta-analysis
    Schreglmann, Sebastian R.
    Krauss, Joachim K.
    Chang, Jin Woo
    Bhatia, Kailash P.
    Kagi, Georg
    BMJ OPEN, 2017, 7 (05):