The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions

被引：17

作者：

Hofmann, Hayden L. ^{[1
,3
]}

Guerra, Gage A. ^{[1
]}

Le, Jonathan L. ^{[1
]}

Wong, Alexander M. ^{[1
]}

Hofmann, Grady H. ^{[2
]}

Mayfield, Cory K. ^{[1
]}

Petrigliano, Frank A. ^{[1
]}

Liu, Joseph N. ^{[1
]}

机构：

[1] Keck Med USC, USC Epstein Family Ctr Sports Med, Los Angeles, CA USA

[2] Stanford Univ, Dept Biol, Palo Alto, CA USA

[3] Keck Med USC, USC Epstein Family Ctr Sports Med, 1520 San Pablo St 2000, Los Angeles, CA 90033 USA

来源：

ORTHOPEDICS | 2024年 / 47卷 / 02期

关键词：

D O I：

10.3928/01477447-20230922-05

中图分类号：

R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学（修复外科学）];

学科分类号：

摘要：

Advances in artificial intelligence and machine learning models, like Chat Generative Pre-trained Transformer (ChatGPT), have occurred at a remarkably fast rate. OpenAI released its newest model of ChatGPT, GPT-4, in March 2023. It offers a wide range of medical applications. The model has demonstrated notable proficiency on many medical board examinations. This study sought to assess GPT-4's performance on the Orthopaedic In-Training Examination (OITE) used to prepare residents for the American Board of Orthopaedic Surgery (ABOS) Part I Examination. The data gathered from GPT-4's performance were additionally compared with the data of the previous iteration of ChatGPT, GPT-3.5, which was released 4 months before GPT-4. GPT-4 correctly answered 251 of the 396 attempted questions (63.4%), whereas GPT-3.5 correctly answered 46.3% of 410 attempted questions. GPT-4 was significantly more accurate than GPT-3.5 on orthopedic board-style questions (P<.00001). GPT-4's performance is most comparable to that of an average third-year orthopedic surgery resident, while GPT-3.5 performed below an average orthopedic intern. GPT-4's overall accuracy was just below the approximate threshold that indicates a likely pass on the ABOS Part I Examination. Our results demonstrate significant improvements in OpenAI's newest model, GPT-4. Future studies should assess potential clinical applications as AI models continue to be trained on larger data sets and offer more capabilities. [Orthopedics. 2024;47(2):e85 -e89.]

引用

页码：e85 / e89

页数：6

共 50 条

[1] The Performance of Artificial Intelligence Chatbot (GPT-4) on Image-Based Dermatology Certification Board Exam Questions
Samman, Luna
Akuffo-Addo, Edgar
Rao, Babar
JOURNAL OF CUTANEOUS MEDICINE AND SURGERY, 2024, 28 (05) : 507 - 508
[2] GPT-4's Performance on the European Board of Interventional Radiology Sample Questions
Besler, Muhammed Said
CARDIOVASCULAR AND INTERVENTIONAL RADIOLOGY, 2024, 47 (05) : 683 - 684
[3] GPT-4's Performance on the European Board of Interventional Radiology Sample Questions
Muhammed Said Beşler
CardioVascular and Interventional Radiology, 2024, 47 : 683 - 684
[4] GPT-4, artificial intelligence and implications for publishing
Ong, C. W. M.
Blackbourn, H. D.
Migiliori, G. B.
INTERNATIONAL JOURNAL OF TUBERCULOSIS AND LUNG DISEASE, 2023, 27 (06) : 425 - 426
[5] GPT-4: a new era of artificial intelligence in medicine
Waisberg, Ethan
Ong, Joshua
Masalkhi, Mouayad
Kamran, Sharif Amit
Zaman, Nasif
Sarker, Prithul
Lee, Andrew G.
Tavakkoli, Alireza
IRISH JOURNAL OF MEDICAL SCIENCE, 2023, 192 (06) : 3197 - 3200
[6] GPT-4: a new era of artificial intelligence in medicine
Ethan Waisberg
Joshua Ong
Mouayad Masalkhi
Sharif Amit Kamran
Nasif Zaman
Prithul Sarker
Andrew G. Lee
Alireza Tavakkoli
Irish Journal of Medical Science (1971 -), 2023, 192 : 3197 - 3200
[7] ARTIFICIAL REASON AND ARTIFICIAL INTELLIGENCE: THE LEGAL REASONING CAPABILITIES OF GPT-4
Spaic, Bojan
Jovanovic, Miodrag
ANNALS OF THE FACULTY OF LAW IN BELGRADE, 2024, 72 (03): : 383 - 422
[8] Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions
Moshirfar, Majid
Altaf, Amal W.
Stoakes, Isabella M.
Tuttle, Jared J.
Hoopes, Phillip C.
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (06)
[9] GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions
Guerra, Gage A.
Hofmann, Hayden
Sobhani, Sina
Hofmann, Grady
Gomez, David
Soroudi, Daniel
Hopkins, Benjamin S.
Dallas, Jonathan
Pangal, Dhiraj J.
Cheok, Stephanie
Nguyen, Vincent N.
Mack, William J.
Zada, Gabriel
WORLD NEUROSURGERY, 2023, 179 : E160 - E165
[10] GPT-4: the future of artificial intelligence in medical school assessments
Haruna-Cooper, Lois
Rashid, Mohammed Ahmed
JOURNAL OF THE ROYAL SOCIETY OF MEDICINE, 2023, 116 (06) : 218 - 219

← 1 2 3 4 5 →