ChatGPT in Iranian medical licensing examination: evaluating the diagnostic accuracy and decision-making capabilities of an AI-based model

被引:14
|
作者
Ebrahimian, Manoochehr [1 ]
Behnam, Behdad [2 ]
Ghayebi, Negin [3 ]
Sobhrakhshankhah, Elham [2 ]
机构
[1] Shahid Beheshti Univ Med Sci, Res Inst Childrens Hlth, Pediat Surg Res Ctr, Tehran, Iran
[2] Iran Univ Med Sci, Gastrointestinal & Liver Dis Res Ctr, Tehran, Iran
[3] Shahid Beheshti Univ Med Sci, Sch Med, Tehran, Iran
关键词
Artificial intelligence; Decision Making; Computer-Assisted; Neural Networks; Computer; RISKS;
D O I
10.1136/bmjhci-2023-100815
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
IntroductionLarge language models such as ChatGPT have gained popularity for their ability to generate comprehensive responses to human queries. In the field of medicine, ChatGPT has shown promise in applications ranging from diagnostics to decision-making. However, its performance in medical examinations and its comparison to random guessing have not been extensively studied.MethodsThis study aimed to evaluate the performance of ChatGPT in the preinternship examination, a comprehensive medical assessment for students in Iran. The examination consisted of 200 multiple-choice questions categorised into basic science evaluation, diagnosis and decision-making. GPT-4 was used, and the questions were translated to English. A statistical analysis was conducted to assess the performance of ChatGPT and also compare it with a random test group.ResultsThe results showed that ChatGPT performed exceptionally well, with 68.5% of the questions answered correctly, significantly surpassing the pass mark of 45%. It exhibited superior performance in decision-making and successfully passed all specialties. Comparing ChatGPT to the random test group, ChatGPT's performance was significantly higher, demonstrating its ability to provide more accurate responses and reasoning.ConclusionThis study highlights the potential of ChatGPT in medical licensing examinations and its advantage over random guessing. However, it is important to note that ChatGPT still falls short of human physicians in terms of diagnostic accuracy and decision-making capabilities. Caution should be exercised when using ChatGPT, and its results should be verified by human experts to ensure patient safety and avoid potential errors in the medical field.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] AI Versus MD: Evaluating the surgical decision-making accuracy of ChatGPT-4
    Palenzuela, Deanna L.
    Mullen, John T.
    Phitayakorn, Roy
    SURGERY, 2024, 176 (02) : 241 - 245
  • [2] Enhancing medical decision-making with ChatGPT and explainable AI
    Chopra, Aryan
    Rajput, Dharmendra Singh
    Patel, Harshita
    INTERNATIONAL JOURNAL OF SURGERY, 2024, 110 (08) : 5167 - 5168
  • [3] AI-supported decision-making in obstetrics - a feasibility study on the medical accuracy and reliability of ChatGPT
    Bader, Simon
    Schneider, Michael O.
    Psilopatis, Iason
    Anetsberger, Daniel
    Emons, Julius
    Kehl, Sven
    ZEITSCHRIFT FUR GEBURTSHILFE UND NEONATOLOGIE, 2025, 229 (01): : 15 - 21
  • [4] 'Smart ' Choice? Evaluating AI-Based mobile decision bots for in-store decision-making
    Chattaraman, Veena
    Kwon, Wi-Suk
    Ross, Kassandra
    Sung, Jihyun
    Alikhademi, Kiana
    Richardson, Brianna
    Gilbert, Juan E.
    JOURNAL OF BUSINESS RESEARCH, 2024, 183
  • [5] Letter to the editor on: "AI versus MD: Evaluating the surgical decision-making accuracy of ChatGPT-4"
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    SURGERY, 2024, 176 (06) : 1782 - 1782
  • [6] A Comparative Analysis of AI Models in Complex Medical Decision-Making Scenarios: Evaluating ChatGPT, Claude AI, Bard, and Perplexity
    Uppalapati, Vamsi Krishna
    Nag, Deb Sanjay
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (01)
  • [7] Pediatric dermatologists vs AI bots: Evaluating the medical knowledge and diagnostic capabilities of ChatGPT
    Huang, C.
    Zhang, E.
    Margozzini, M. Caussade
    Brown, T.
    Hogrogian, G. Stockton
    Yan, A.
    JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2024, 144 (08) : S39 - S39
  • [8] Pediatric dermatologists versus AI bots: Evaluating the medical knowledge and diagnostic capabilities of ChatGPT
    Huang, Charles Y.
    Zhang, Esther
    Caussade, Marie-Chantal
    Brown, Trinity
    Stockton Hogrogian, Griffin
    Yan, Albert C.
    PEDIATRIC DERMATOLOGY, 2024, 41 (05) : 831 - 834
  • [9] Failures in the Loop: Human Leadership in AI-Based Decision-Making
    Michael, Katina
    Schoenherr, Jordan Richard
    Vogel, Kathleen M.
    IEEE Transactions on Technology and Society, 2024, 5 (01): : 2 - 13
  • [10] AI-based Decision-making Model for the Development of a Manufacturing Company in the context of Industry 4.0
    Patalas-Maliszewska, Justyna
    Pajak, Iwona
    Skrzeszewska, Malgorzata
    2020 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2020,