GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions

被引：45

作者：

Guerra, Gage A. ^{[1
]}

Hofmann, Hayden ^{[1
]}

Sobhani, Sina ^{[1
]}

Hofmann, Grady ^{[2
]}

Gomez, David ^{[1
]}

Soroudi, Daniel ^{[3
]}

Hopkins, Benjamin S. ^{[1
]}

Dallas, Jonathan ^{[1
]}

Pangal, Dhiraj J. ^{[1
]}

Cheok, Stephanie ^{[1
]}

Nguyen, Vincent N. ^{[1
]}

Mack, William J. ^{[1
]}

Zada, Gabriel ^{[1
]}

机构：

[1] Univ Southern Calif, Dept Neurosurg, Los Angeles, CA 90007 USA

[2] Stanford Univ, Dept Biol, Palo Alto, CA USA

[3] Univ Calif San Francisco, Sch Med, San Francisco, CA USA

来源：

WORLD NEUROSURGERY | 2023年 / 179卷

关键词：

Artificial intelligence; ChatGPT; GPT-4; Machine learning; Neurosurgical boards; Neurosurgical training; SANS question;

D O I：

10.1016/j.wneu.2023.08.042

中图分类号：

R74 [神经病学与精神病学];

学科分类号：

摘要：

-BACKGROUND: Artificial intelligence (AI) and machine learning have transformed health care with applications in various specialized fields. Neurosurgery can benefit from artificial intelligence in surgical planning, predicting patient outcomes, and analyzing neuroimaging data. GPT-4, an -pdated language model with additional training parameters, has exhibited exceptional performance on standardized exams. This study examines GPT-4's competence on neurosurgical board-style questions, comparing its performance with medical students and residents, to explore its potential in medical education and clinical decision-making.-METHODS: GPT-4's performance was examined on 643 Congress of Neurological Surgeons Self-Assessment Neurosurgery Exam (SANS) board-style questions from various neurosurgery subspecialties. Of these, 477 were text-based and 166 contained images. GPT-4 refused to answer 52 questions that contained no text. The remaining 591 questions were inputted into GPT-4, and its performance was evaluated based on first-time responses. Raw scores were analyzed across subspecialties and question types, and then compared to previous findings on Chat Generative pre-trained transformer performance against SANS users, medical students, and neurosurgery residents.-RESULTS: GPT-4 attempted 91.9% of Congress of Neurological Surgeons SANS questions and achieved 76.6% accuracy. The model's accuracy increased to 79.0% for text-only questions. GPT-4 outperformed Chat Generative pre-trained transformer (P < 0.001) and scored highest in pain/peripheral nerve (84%) and lowest in spine (73%) categories. It exceeded the performance of medical students (26.3%), neurosurgery residents (61.5%), and the national average of SANS users (69.3%) across all categories.-CONCLUSIONS: GPT-4 significantly outperformed medical students, neurosurgery residents, and the national average of SANS users. The mode's accuracy suggests potential applications in educational settings and clinical decision-making, enhancing provider efficiency, and improving patient care.

引用

页码：E160 / E165

页数：6

共 8 条

[1] Letter to the Editor Regarding: "GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions"
Liu, Ming
Huang, Fang
Zhang, Chenghong
WORLD NEUROSURGERY, 2024, 184 : 351 - 351
[2] Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations
Ali, Rohaid
Tang, Oliver Y.
Connolly, Ian D.
Sullivan, Patricia L. Zadnik
Shin, John H.
Fridley, Jared S.
Asaad, Wael F.
Cielo, Deus
Oyelese, Adetokunbo A.
Doberstein, Curtis E.
Gokaslan, Ziya L.
Telfeian, Albert E.
NEUROSURGERY, 2023, 93 (06) : 1353 - 1365
[3] Letter: Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations
Wang, Shuo
Kinoshita, Shotaro
Yokoyama, Hiromi M.
NEUROSURGERY, 2024, 95 (05) : e151 - e152
[4] Letter: Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations
Zhu, Huali
Kong, Yi
NEUROSURGERY, 2024, 95 (03) : e80 - e80
[5] ChatGPT and GPT-4 in Ophthalmology: Applications of Large Language Model Artificial Intelligence in Retina
Ong, Joshua
Hariprasad, Seenu M.
Chhablani, Jay
OPHTHALMIC SURGERY LASERS & IMAGING RETINA, 2023, 54 (10): : 557 - 562
[6] The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions
Hofmann, Hayden L.
Guerra, Gage A.
Le, Jonathan L.
Wong, Alexander M.
Hofmann, Grady H.
Mayfield, Cory K.
Petrigliano, Frank A.
Liu, Joseph N.
ORTHOPEDICS, 2024, 47 (02) : e85 - e89
[7] The Performance of Artificial Intelligence Chatbot (GPT-4) on Image-Based Dermatology Certification Board Exam Questions
Samman, Luna
Akuffo-Addo, Edgar
Rao, Babar
JOURNAL OF CUTANEOUS MEDICINE AND SURGERY, 2024, 28 (05) : 507 - 508
[8] The performance of ChatGPT versus neurosurgery residents in neurosurgical board examination-like questions: a systematic review and meta-analysis
Bongco, Edgar Dominic A.
Cua, Sean Kendrich N.
Hernandez, Mary Angeline Luz U.
Pascual, Juan Silvestre G.
Khu, Kathleen Joy O.
NEUROSURGICAL REVIEW, 2024, 47 (01)

← 1 →