The ability of artificial intelligence tools to formulate orthopaedic clinical decisions in comparison to human clinicians: An analysis of ChatGPT 3.5, ChatGPT 4, and Bard

被引:6
|
作者
Agharia, Suzen [1 ]
Szatkowski, Jan [2 ]
Fraval, Andrew [1 ]
Stevens, Jarrad [1 ]
Zhou, Yushy [1 ,3 ]
机构
[1] St Vincents Hosp, Dept Orthopaed Surg, Melbourne, Vic, Australia
[2] Indiana Univ Hlth Methodist Hosp, Dept Orthopaed Surg, Indianapolis, IN USA
[3] Level 2,Clin Sci Bldg,29 Regent St, Fitzroy, Vic 3065, Australia
关键词
AI; CHALLENGES; QUESTIONS;
D O I
10.1016/j.jor.2023.11.063
中图分类号
R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学(修复外科学)];
学科分类号
摘要
Background: Recent advancements in artificial intelligence (AI) have sparked interest in its integration into clinical medicine and education. This study evaluates the performance of three AI tools compared to human clinicians in addressing complex orthopaedic decisions in real-world clinical cases.Questions/purposes: To evaluate the ability of commonly used AI tools to formulate orthopaedic clinical decisions in comparison to human clinicians.Patients and methods: The study used OrthoBullets Cases, a publicly available clinical cases collaboration platform where surgeons from around the world choose treatment options based on peer-reviewed standardised treatment polls. The clinical cases cover various orthopaedic categories. Three AI tools, (ChatGPT 3.5, ChatGPT 4, and Bard), were evaluated. Uniform prompts were used to input case information including questions relating to the case, and the AI tools' responses were analysed for alignment with the most popular response, within 10%, and within 20% of the most popular human responses.Results: In total, 8 clinical categories comprising of 97 questions were analysed. ChatGPT 4 demonstrated the highest proportion of most popular responses (pro-portion of most popular response: ChatGPT 4 68.0%, ChatGPT 3.5 40.2%, Bard 45.4%, P value < 0.001), outperforming other AI tools. AI tools performed poorer in questions that were considered controversial (where disagreement occurred in human responses). Inter-tool agreement, as evaluated using Cohen's kappa coefficient, ranged from 0.201 (ChatGPT 4 vs. Bard) to 0.634 (ChatGPT 3.5 vs. Bard). However, AI tool responses varied widely, reflecting a need for consistency in real-world clinical applications.Conclusions: While AI tools demonstrated potential use in educational contexts, their integration into clinical decision-making requires caution due to inconsistent responses and deviations from peer consensus. Future research should focus on specialised clinical AI tool development to maximise utility in clinical decision -making.Level of evidence: IV.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 50 条
  • [41] Comment on: ‘Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination’ and ‘ChatGPT in ophthalmology: the dawn of a new era?’
    Nima Ghadiri
    Eye, 2024, 38 : 654 - 655
  • [42] Can large language models pass official high-grade exams of the European Society of Neuroradiology courses? A direct comparison between OpenAI chatGPT 3.5, OpenAI GPT4 and Google Bard
    D'Anna, Gennaro
    Van Cauter, Sofie
    Thurnher, Majda
    Van Goethem, Johan
    Haller, Sven
    NEURORADIOLOGY, 2024, 66 (08) : 1245 - 1250
  • [43] Artificial Intelligence-Based Chatbots' Ability to Interpret Mammography Images: A Comparison of Chat-GPT 4o and Claude 3.5
    Karahan, Betul Nalan
    Emekli, Emre
    Altin, Mahmut Altug
    EUROPEAN JOURNAL OF THERAPEUTICS, 2025, 31 (01): : 28 - 34
  • [44] Self-Captured Images Recognition by Artificial Intelligence (AI) in Common Nephrology Medications: A Comparative Analysis of ChatGPT-4 and Claude 3 Opus
    Sheikh, M. Salman
    Dreesman, Benjamin
    Barreto, Erin F.
    Miao Jing
    Thongprayoon, Charat
    Qureshi, Fawad
    Craici, Iasmina
    Kashani, Kianoush
    Cheungpasitporn, Wisit
    JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2024, 35 (10):
  • [45] Advancing Artificial Intelligence for Clinical Knowledge Retrieval: A Case Study Using ChatGPT-4 and Link Retrieval Plug-In to Analyze Diabetic Ketoacidosis Guidelines
    Hamed, Ehab
    Sharif, Anna
    Eid, Ahmad
    Alfehaidi, Alanoud
    Alberry, Medhat
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (07)
  • [46] Artificial Intelligence Tools and Bias in Journalism-related Content Generation: Comparison Between Chat GPT3.5, GPT-4 and Bing
    Castillo-Campos, Mar
    Varona-Aramburu, David
    Becerra-Alonso, David
    TRIPODOS, 2024, (55): : 99 - 115
  • [47] Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions
    Moshirfar, Majid
    Altaf, Amal W.
    Stoakes, Isabella M.
    Tuttle, Jared J.
    Hoopes, Phillip C.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (06)
  • [48] Quality of Information Provided by Artificial Intelligence Chatbots Surrounding the Reconstructive Surgery for Head and Neck Cancer: A Comparative Analysis Between ChatGPT4 and Claude2
    Boscolo-Rizzo, Paolo
    Marcuzzo, Alberto Vito
    Lazzarin, Chiara
    Giudici, Fabiola
    Polesel, Jerry
    Stellin, Marco
    Pettorelli, Andrea
    Spinato, Giacomo
    Ottaviano, Giancarlo
    Ferrari, Marco
    Borsetto, Daniele
    Zucchini, Simone
    Trabalzini, Franco
    Sia, Egidio
    Gardenal, Nicoletta
    Baruca, Roberto
    Fortunati, Alfonso
    Vaira, Luigi Angelo
    Tirelli, Giancarlo
    CLINICAL OTOLARYNGOLOGY, 2025, 50 (02) : 330 - 335
  • [49] Artificial Intelligence (ChatGPT-4o) in Adjuvant Treatment Decision-Making for Stage II Colon Cancer: A Comparative Analysis with Clinician Recommendations and NCCN/ESMO Guidelines
    Kus, Fatih
    Chalabiyev, Elvin
    Yildirim, Hasan Cagri
    Koc Kus, Ilgin
    Sirvan, Firat
    Dizdar, Omer
    Yalcin, Suayib
    UHOD-ULUSLARARASI HEMATOLOJI-ONKOLOJI DERGISI, 2025, 35 (01): : 68 - 74
  • [50] Evaluating Artificial Intelligence in Spinal Cord Injury Management: A Comparative Analysis of ChatGPT-4o and Google Gemini Against American College of Surgeons Best Practices Guidelines for Spine Injury
    Yu, Alexander
    Li, Albert
    Ahmed, Wasil
    Saturno, Michael
    Cho, Samuel K.
    GLOBAL SPINE JOURNAL, 2025,