The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis

被引:0
|
作者
Bongco, Edgar Dominic A. [1 ,2 ]
Cua, Sean Kendrich N. [1 ,2 ]
Hernandez, Mary Angeline Luz U. [1 ,2 ]
Pascual, Juan Silvestre G. [1 ,2 ]
Khu, Kathleen Joy O. [1 ,2 ]
机构
[1] Univ Philippines Manila, Coll Med, Dept Neurosci, Div Neurosurg, Manila, Philippines
[2] Univ Philippines Manila, Philippine Gen Hosp, Manila, Philippines
关键词
ChatGPT; Large language model; Neurosurgery education; Neurosurgery board examination;
D O I
10.1007/s10143-024-03144-y
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
Objective Large language models and ChatGPT have been used in different fields of medical education. This study aimed to review the literature on the performance of ChatGPT in neurosurgery board examination-like questions compared to neurosurgery residents. Methods A literature search was performed following PRISMA guidelines, covering the time period of ChatGPT's inception (November 2022) until October 25, 2024. Two reviewers screened for eligible studies, selecting those that used ChatGPT to answer neurosurgery board examination-like questions and compared the results with neurosurgery residents' scores. Risk of bias was assessed using JBI critical appraisal tool. Overall effect sizes and 95% confidence intervals were determined using a fixed-effects model with alpha at 0.05. Results After screening, six studies were selected for qualitative and quantitative analysis. Accuracy of ChatGPT ranged from 50.4 to 78.8%, compared to residents' accuracy of 58.3 to 73.7%. Risk of bias was low in 4 out of 6 studies reviewed; the rest had moderate risk. There was an overall trend favoring neurosurgery residents versus ChatGPT (p < 0.00001), with high heterogeneity (I2 = 96). These findings were similar on sub-group analysis of studies that used the Self-assessment in Neurosurgery (SANS) examination questions. However, on sensitivity analysis, removal of the highest weighted study skewed the results toward better performance of ChatGPT. Conclusion Our meta-analysis showed that neurosurgery residents performed better than ChatGPT in answering neurosurgery board examination-like questions, although reviewed studies had high heterogeneity. Further improvement is necessary before it can become a useful and reliable supplementary tool in the delivery of neurosurgical education.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Performance of ChatGPT versus Google Bard on Answering Postgraduate-Level Surgical Examination Questions: A Meta-Analysis
    Andrew, Albert
    Zhao, Sunny
    INDIAN JOURNAL OF SURGERY, 2025,
  • [2] Performance of ChatGPT in medical examinations: A systematic review and a meta-analysis
    Levin, Gabriel
    Horesh, Nir
    Brezinov, Yoav
    Meyer, Raanan
    BJOG-AN INTERNATIONAL JOURNAL OF OBSTETRICS AND GYNAECOLOGY, 2024, 131 (03) : 378 - 380
  • [3] Burnout Among Neurosurgeons and Residents in Neurosurgery: A Systematic Review and Meta-Analysis of the Literature
    Zaed, Ismail
    Jaaiddane, Youssef
    Chibbaro, Salvatore
    Tinterri, Benedetta
    WORLD NEUROSURGERY, 2020, 143 : E529 - E534
  • [4] ChatGPT versus the neurosurgical written boards: a comparative analysis of artificial intelligence/machine learning performance on neurosurgical board-style questions
    Hopkins, Benjamin S.
    Nguyen, Vincent N.
    Dallas, Jonathan
    Texakalidis, Pavlos
    Yang, Max
    Renn, Alex
    Guerra, Gage
    Kashif, Zain
    Cheok, Stephanie
    Zada, Gabriel
    Mack, William J.
    JOURNAL OF NEUROSURGERY, 2023, 139 (03) : 904 - 911
  • [5] Delirium in neurosurgery: a systematic review and meta-analysis
    Kappen, P. R.
    Kakar, E.
    Dirven, C. M. F.
    van der Jagt, M.
    Klimek, M.
    Osse, R. J.
    Vincent, A. P. J. E.
    NEUROSURGICAL REVIEW, 2022, 45 (01) : 329 - 341
  • [6] Delirium in neurosurgery: a systematic review and meta-analysis
    P. R. Kappen
    E. Kakar
    C. M. F. Dirven
    M. van der Jagt
    M. Klimek
    R. J. Osse
    A. P. J. E. Vincent
    Neurosurgical Review, 2022, 45 : 329 - 341
  • [7] GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions
    Guerra, Gage A.
    Hofmann, Hayden
    Sobhani, Sina
    Hofmann, Grady
    Gomez, David
    Soroudi, Daniel
    Hopkins, Benjamin S.
    Dallas, Jonathan
    Pangal, Dhiraj J.
    Cheok, Stephanie
    Nguyen, Vincent N.
    Mack, William J.
    Zada, Gabriel
    WORLD NEUROSURGERY, 2023, 179 : E160 - E165
  • [8] Neurosurgical Malpractice Litigation: A Systematic Review and Meta-Analysis
    Iqbal, Javed
    Shafique, Muhammad Ashir
    Mustafa, Muhammad Saqlain
    Covell, Michael M.
    Fatima, Afia
    Saboor, Hafiz Abdus
    Nadeem, Abdullah
    Iqbal, Ather
    Iqbal, Muhammad Faheem
    Rangwala, Burhanuddin Sohail
    Hafeez, Muhammad Hassan
    Bowers, Christian A.
    WORLD NEUROSURGERY, 2024, 188 : 55 - 67
  • [9] Letter to the Editor Regarding "Burnout Among Neurosurgeons and Residents in Neurosurgery: A Systematic Review and Meta-Analysis of the Literature"
    Tao, Yichi
    Xu, Haicheng
    Huang, Xin
    WORLD NEUROSURGERY, 2022, 164 : 478 - 478
  • [10] Letter to the Editor Regarding "Burnout Among Neurosurgeons and Residents in Neurosurgery: A Systematic Review and Meta-Analysis of the Literature"
    Tao, Yichi
    Xu, Haicheng
    Huang, Xin
    WORLD NEUROSURGERY, 2022, 164 : 478 - 478