Performance of ChatGPT on Nephrology Test Questions

被引:39
|
作者
Miao, Jing [1 ]
Thongprayoon, Charat [1 ]
Valencia, Oscar A. Garcia [1 ]
Krisanapan, Pajaree [1 ]
Sheikh, Mohammad S. [1 ]
Davis, Paul W. [1 ]
Mekraksakit, Poemlarp [1 ]
Suarez, Maria Gonzalez [1 ]
Craici, Iasmina M. [1 ]
Cheungpasitporn, Wisit [1 ]
机构
[1] Mayo Clin, Dept Med, Div Nephrol & Hypertens, Rochester, MN 55902 USA
关键词
clinical nephrology; kidney disease; medical education;
D O I
10.2215/CJN.0000000000000330
中图分类号
R5 [内科学]; R69 [泌尿科学(泌尿生殖系疾病)];
学科分类号
1002 ; 100201 ;
摘要
Background ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions.Methods Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance.Results A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) (P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 (P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%).Conclusions ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.
引用
收藏
页码:35 / 43
页数:9
相关论文
共 50 条
  • [41] The performance of ChatGPT in generating answers to clinical questions in psychiatry: a two-layer assessment
    Luykx, Jurjen J.
    Gerritse, Frank
    Habets, Philippe C.
    Vinkers, Christiaan H.
    WORLD PSYCHIATRY, 2023, 22 (03) : 479 - 480
  • [42] Performance of ChatGPT on Responding to Common Online Questions Regarding Key Information Gaps in Glaucoma
    Wu, Jo-Hsuan
    Nishida, Takashi
    Moghimi, Sasan
    Weinreb, Robert N.
    JOURNAL OF GLAUCOMA, 2024, 33 (07) : e54 - e56
  • [43] Evaluating ChatGPT's Performance in Answering Questions About Allergic Rhinitis and Chronic Rhinosinusitis
    Ye, Fan
    Zhang, He
    Luo, Xin
    Wu, Tong
    Yang, Qintai
    Shi, Zhaohui
    OTOLARYNGOLOGY-HEAD AND NECK SURGERY, 2024, 171 (02) : 571 - 577
  • [44] Analyzing Question Characteristics Influencing ChatGPT's Performance in 3000 USMLE®-Style Questions
    Alfertshofer, Michael
    Knoedler, Samuel
    Hoch, Cosima C.
    Cotofana, Sebastian
    Panayi, Adriana C.
    Kauke-Navarro, Martin
    Tullius, Stefan G.
    Orgill, Dennis P.
    Austen, William G.
    Pomahac, Bohdan
    Knoedler, Leonard
    MEDICAL SCIENCE EDUCATOR, 2025, 35 (01) : 257 - 267
  • [45] The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma
    Dogan, Levent
    Yilmaz, Ibrahim Edhem
    EUROPEAN JOURNAL OF OPHTHALMOLOGY, 2025,
  • [46] Evaluating the Performance of ChatGPT in answering questions related to benign prostate hyperplasia and prostate cancer
    Caglar, Ufuk
    Yildiz, Oguzhan
    Meric, Arda
    Ayranci, Ali
    Yusuf, Resit
    Sarilar, Omer
    Ozgor, Faruk
    MINERVA UROLOGY AND NEPHROLOGY, 2023, 75 (06): : 729 - 733
  • [47] Innovating Personalized Nephrology Care: Exploring the Potential Utilization of ChatGPT
    Miao, Jing
    Thongprayoon, Charat
    Suppadungsuk, Supawadee
    Garcia Valencia, Oscar A.
    Qureshi, Fawad
    Cheungpasitporn, Wisit
    JOURNAL OF PERSONALIZED MEDICINE, 2023, 13 (12):
  • [48] ChatGPT: A test drive
    Wang, J.
    AMERICAN JOURNAL OF PHYSICS, 2023, 91 (04) : 255 - 256
  • [49] Issues for consideration about use of ChatGPT. Comment on 'Performance of ChatGPT on Specialty Certificate Examination in Dermatology multiple-choice questions'
    Kleebayoon, Amnuay
    Wiwanitkit, Viroj
    CLINICAL AND EXPERIMENTAL DERMATOLOGY, 2023,
  • [50] A comparative evaluation of ChatGPT 3.5 and ChatGPT 4 in responses to selected genetics questions
    McGrath, Scott P.
    Kozel, Beth A.
    Gracefo, Sara
    Sutherland, Nykole
    Danford, Christopher J.
    Walton, Nephi
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (10) : 2271 - 2283