A pilot evaluation of the diagnostic accuracy of ChatGPT-3.5 for multiple sclerosis from case reports

被引:0
|
作者
Joseph, Anika [1 ]
Joseph, Kevin [2 ]
Joseph, Angelyn [3 ]
机构
[1] Univ Ottawa, Hlth Sci Program, 75 Laurier Ave E, Ottawa, ON K1N 6N5, Canada
[2] Univ Ottawa, Biomed Sci Program, 75 Laurier Ave E, Ottawa, ON K1N 6N5, Canada
[3] Merivale High Sch, 1755 Merivale Rd, Nepean, ON K2G 1E2, Canada
关键词
artificial intelligence; multiple sclerosis; case reports; legal;
D O I
10.1515/tnsci-2022-0361
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
The limitation of artificial intelligence (AI) large language models to diagnose diseases from the perspective of patient safety remains underexplored and potential challenges, such as diagnostic errors and legal challenges, need to be addressed. To demonstrate the limitations of AI, we used ChatGPT-3.5 developed by OpenAI, as a tool for medical diagnosis using text-based case reports of multiple sclerosis (MS), which was selected as a prototypic disease. We analyzed 98 peer-reviewed case reports selected based on free-full text availability and published within the past decade (2014-2024), excluding any mention of an MS diagnosis to avoid bias. ChatGPT-3.5 was used to interpret clinical presentations and laboratory data from these reports. The model correctly diagnosed MS in 77 cases, achieving an accuracy rate of 78.6%. However, the remaining 21 cases were misdiagnosed, highlighting the model's limitations. Factors contributing to the errors include variability in data presentation and the inherent complexity of MS diagnosis, which requires imaging modalities in addition to clinical presentations and laboratory data. While these findings suggest that AI can support disease diagnosis and healthcare providers in decision-making, inadequate training with large datasets may lead to significant inaccuracies. Integrating AI into clinical practice necessitates rigorous validation and robust regulatory frameworks to ensure responsible use.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Accuracy of ChatGPT-3.5 and -4 in providing scientific references in otolaryngology–head and neck surgery
    Jerome R. Lechien
    Giovanni Briganti
    Luigi A. Vaira
    European Archives of Oto-Rhino-Laryngology, 2024, 281 : 2159 - 2165
  • [2] Accuracy and consistency of ChatGPT-3.5 and-4 in providing differential diagnoses in oral and maxillofacial diseases: a comparative diagnostic performance analysis
    Tomo, Saygo
    Lechien, Jerome R.
    Bueno, Hugo Sobrinho
    Cantieri-Debortoli, Daniela Filie
    Simonato, Luciana Estevam
    CLINICAL ORAL INVESTIGATIONS, 2024, 28 (10)
  • [3] Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study
    Wang, Ying-Mei
    Shen, Hung-Wei
    Chen, Tzeng-Ji
    Chiang, Shu-Chiung
    Lin, Ting-Guan
    JMIR MEDICAL EDUCATION, 2025, 11
  • [4] ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources
    Tao, Brendan Ka-Lok
    Hua, Nicholas
    Milkovich, John
    Micieli, Jonathan Andrew
    EYE, 2024, 38 (10) : 1897 - 1902
  • [5] Comparative evaluation of ChatGPT-4, ChatGPT-3.5 and Google Gemini on PCOS assessment and management based on recommendations from the 2023 guideline
    Gunesli, Irmak
    Aksun, Seren
    Fathelbab, Jana
    Yildiz, Bulent Okan
    ENDOCRINE, 2024, : 315 - 322
  • [6] Accuracy of ChatGPT-3.5 and-4 in providing scientific references in otolaryngology-head and neck surgery
    Lechien, Jerome R.
    Briganti, Giovanni
    Vaira, Luigi A.
    EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (06) : 3309 - 3311
  • [7] Accuracy of ChatGPT-3.5 and GPT-4 in diagnosing clinical scenarios in dermatology involving skin of color
    Qureshi, Simal
    Alli, Sauliha Rabia
    Ogunyemi, Boluwaji
    INTERNATIONAL JOURNAL OF DERMATOLOGY, 2024, 63 (11) : e353 - e354
  • [8] Assessment Study of ChatGPT-3.5's Performance on the Final Polish Medical Examination: Accuracy in Answering 980 Questions
    Siebielec, Julia
    Ordak, Michal
    Oskroba, Agata
    Dworakowska, Anna
    Bujalska-Zadrozny, Magdalena
    HEALTHCARE, 2024, 12 (16)
  • [9] INTERVENTIONAL NEPHROLOGY ASSESSMENT QUESTIONS: A PERFORMANCE EVALUATION AND COMPARATIVE ANALYSIS OF CHATGPT-3.5 AND GPT-4
    Sheikh, Mohammad
    Qureshi, Fawad
    Thongprayoon, Charat
    Suarez, Lourdes Gonzalez
    Craici, Lasmina
    Cheungpasitporn, Visit
    AMERICAN JOURNAL OF KIDNEY DISEASES, 2024, 83 (04) : S100 - S101
  • [10] A Performance Evaluation of Large Language Models in Keratoconus: A Comparative Study of ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity
    Reyhan, Ali Hakim
    Mutaf, Cagri
    Uzun, Irfan
    Yuksekyayla, Funda
    JOURNAL OF CLINICAL MEDICINE, 2024, 13 (21)