The ability of artificial intelligence tools to formulate orthopaedic clinical decisions in comparison to human clinicians: An analysis of ChatGPT 3.5, ChatGPT 4, and Bard

被引：6

作者：

Agharia, Suzen ^{[1
]}

Szatkowski, Jan ^{[2
]}

Fraval, Andrew ^{[1
]}

Stevens, Jarrad ^{[1
]}

Zhou, Yushy ^{[1
,3
]}

机构：

[1] St Vincents Hosp, Dept Orthopaed Surg, Melbourne, Vic, Australia

[2] Indiana Univ Hlth Methodist Hosp, Dept Orthopaed Surg, Indianapolis, IN USA

[3] Level 2,Clin Sci Bldg,29 Regent St, Fitzroy, Vic 3065, Australia

来源：

JOURNAL OF ORTHOPAEDICS | 2024年 / 50卷

关键词：

AI; CHALLENGES; QUESTIONS;

D O I：

10.1016/j.jor.2023.11.063

中图分类号：

R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学（修复外科学）];

学科分类号：

摘要：

Background: Recent advancements in artificial intelligence (AI) have sparked interest in its integration into clinical medicine and education. This study evaluates the performance of three AI tools compared to human clinicians in addressing complex orthopaedic decisions in real-world clinical cases.Questions/purposes: To evaluate the ability of commonly used AI tools to formulate orthopaedic clinical decisions in comparison to human clinicians.Patients and methods: The study used OrthoBullets Cases, a publicly available clinical cases collaboration platform where surgeons from around the world choose treatment options based on peer-reviewed standardised treatment polls. The clinical cases cover various orthopaedic categories. Three AI tools, (ChatGPT 3.5, ChatGPT 4, and Bard), were evaluated. Uniform prompts were used to input case information including questions relating to the case, and the AI tools' responses were analysed for alignment with the most popular response, within 10%, and within 20% of the most popular human responses.Results: In total, 8 clinical categories comprising of 97 questions were analysed. ChatGPT 4 demonstrated the highest proportion of most popular responses (pro-portion of most popular response: ChatGPT 4 68.0%, ChatGPT 3.5 40.2%, Bard 45.4%, P value < 0.001), outperforming other AI tools. AI tools performed poorer in questions that were considered controversial (where disagreement occurred in human responses). Inter-tool agreement, as evaluated using Cohen's kappa coefficient, ranged from 0.201 (ChatGPT 4 vs. Bard) to 0.634 (ChatGPT 3.5 vs. Bard). However, AI tool responses varied widely, reflecting a need for consistency in real-world clinical applications.Conclusions: While AI tools demonstrated potential use in educational contexts, their integration into clinical decision-making requires caution due to inconsistent responses and deviations from peer consensus. Future research should focus on specialised clinical AI tool development to maximise utility in clinical decision -making.Level of evidence: IV.

引用

页码：1 / 7

页数：7

共 50 条

[31] A Comparative Analysis of ChatGPT-4, Microsoft's Bing and Google's Bard at Answering Rheumatology Clinical Questions
Yingchoncharoen, Pitchaporn
Chaisrimaneepan, Nattanicha
Pangkanon, Watsachon
Thongpiya, Jerapas
ARTHRITIS & RHEUMATOLOGY, 2024, 76 : 2654 - 2655
[32] Evaluation of Advanced Artificial Intelligence Algorithms' Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models
Koyun, Mustafa
Taskent, Ismail
JOURNAL OF CLINICAL MEDICINE, 2025, 14 (02)
[33] ChatGPT-4 and the Global Burden of Disease Study: Advancing Personalized Healthcare Through Artificial Intelligence in Clinical and Translational Medicine
Temsah, Mohamad-Hani
Jamal, Amr
Aljamaan, Fadi
Al-Tawfiq, Jaffar A.
Al-Eyadhy, Ayman
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (05)
[34] A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?
Nakajima, Nozomu
Fujimori, Takahito
Furuya, Masayuki
Kanie, Yuya
Imai, Hirotatsu
Kita, Kosuke
Uemura, Keisuke
Okada, Seiji
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (03)
[35] Artificial intelligence in neurovascular decision-making: a comparative analysis of ChatGPT-4 and multidisciplinary expert recommendations for unruptured intracranial aneurysms
Hadjiathanasiou, Alexis
Goelz, Leonie
Muhn, Florian
Heinz, Rebecca
Kreissl, Lutz
Sparenberg, Paul
Lemcke, Johannes
Schmehl, Ingo
Mutze, Sven
Schuss, Patrick
NEUROSURGICAL REVIEW, 2025, 48 (01)
[36] Conversational LLM Chatbot ChatGPT-4 for Colonoscopy Boston Bowel Preparation Scoring: An Artificial Intelligence-to-Head Concordance Analysis
Pellegrino, Raffaele
Federico, Alessandro
Gravina, Antonietta Gerarda
DIAGNOSTICS, 2024, 14 (22)
[37] Assessing the role of advanced artificial intelligence as a tool in multidisciplinary tumor board decision-making for recurrent/metastatic head and neck cancer cases - the first study on ChatGPT 4o and a comparison to ChatGPT 4.0
Schmidl, Benedikt
Huetten, Tobias
Pigorsch, Steffi
Stoegbauer, Fabian
Hoch, Cosima C.
Hussain, Timon
Wollenberg, Barbara
Wirth, Markus
FRONTIERS IN ONCOLOGY, 2024, 14
[38] Human versus artificial intelligence: evaluating ChatGPT's performance in conducting published systematic reviews with meta-analysis in chronic pain research
Purewal, Anam
Fautsch, Kalli
Klasova, Johana
Hussain, Nasir
D'Souza, Ryan S.
REGIONAL ANESTHESIA AND PAIN MEDICINE, 2025,
[39] Quality of Information Provided by Artificial Intelligence Chatbots Surrounding the Management of Vestibular Schwannomas: A Comparative Analysis Between ChatGPT-4 and Claude 2
Borsetto, Daniele
Sia, Egidio
Axon, Patrick
Donnelly, Neil
Tysome, James R.
Anschuetz, Lukas
Bernardeschi, Daniele
Capriotti, Vincenzo
Caye-Thomasen, Per
West, Niels Cramer
Erbele, Isaac D.
Franchella, Sebastiano
Gatto, Annalisa
Hess-Erga, Jeanette
Kunst, Henricus P. M.
Marinelli, John P.
Mannion, Richard
Panizza, Benedict
Trabalzini, Franco
Obholzer, Rupert
Vaira, Luigi Angelo
Polesel, Jerry
Giudici, Fabiola
Carlson, Matthew L.
Tirelli, Giancarlo
Boscolo-Rizzo, Paolo
OTOLOGY & NEUROTOLOGY, 2025, 46 (04) : 432 - 436
[40] Comment on: 'Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination' and 'ChatGPT in ophthalmology: the dawn of a new era?'
Ghadiri, Nima
EYE, 2024, 38 (04) : 654 - 655

← 1 2 3 4 5 →