Performance of ChatGPT on Nephrology Test Questions

被引:39
|
作者
Miao, Jing [1 ]
Thongprayoon, Charat [1 ]
Valencia, Oscar A. Garcia [1 ]
Krisanapan, Pajaree [1 ]
Sheikh, Mohammad S. [1 ]
Davis, Paul W. [1 ]
Mekraksakit, Poemlarp [1 ]
Suarez, Maria Gonzalez [1 ]
Craici, Iasmina M. [1 ]
Cheungpasitporn, Wisit [1 ]
机构
[1] Mayo Clin, Dept Med, Div Nephrol & Hypertens, Rochester, MN 55902 USA
关键词
clinical nephrology; kidney disease; medical education;
D O I
10.2215/CJN.0000000000000330
中图分类号
R5 [内科学]; R69 [泌尿科学(泌尿生殖系疾病)];
学科分类号
1002 ; 100201 ;
摘要
Background ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions.Methods Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance.Results A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) (P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 (P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%).Conclusions ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.
引用
收藏
页码:35 / 43
页数:9
相关论文
共 50 条
  • [31] Evaluating the performance of the language model ChatGPT in responding to common questions of people with epilepsy
    Wu, Yuxin
    Zhang, Zaiyu
    Dong, Xinyu
    Hong, Siqi
    Hu, Yue
    Liang, Ping
    Li, Lusheng
    Zou, Bin
    Wu, Xuanxuan
    Wang, Difei
    Chen, Hui
    Qiu, Hanli
    Tang, Haotian
    Kang, Kaiyi
    Li, Qinling
    Zhai, Xuan
    EPILEPSY & BEHAVIOR, 2024, 151
  • [32] Performance of ChatGPT on American Board of Surgery In-Training Examination Preparation Questions
    Tran, Catherine G.
    Chang, Jeremy
    Sherman, Scott K.
    De Andrade, James P.
    JOURNAL OF SURGICAL RESEARCH, 2024, 299 : 329 - 335
  • [33] Performance of Large Language Models ChatGPT and Gemini on Workplace Management Questions in Radiology
    Leutz-Schmidt, Patricia
    Palm, Viktoria
    Mathy, Rene Michael
    Groezinger, Martin
    Kauczor, Hans-Ulrich
    Jang, Hyungseok
    Sedaghat, Sam
    DIAGNOSTICS, 2025, 15 (04)
  • [34] Performance of ChatGPT on Specialty Certificate Examination in Dermatology multiple-choice questions
    Passby, Lauren
    Jenko, Nathan
    Wernham, Aaron
    CLINICAL AND EXPERIMENTAL DERMATOLOGY, 2023, 49 (07) : 722 - 727
  • [35] Assessing ChatGPT's performance in answering common questions regarding endometrial cancer
    Kuo, Iris
    Falk, Lauren
    Ladanyi, Andras
    Dewdney, Summer
    GYNECOLOGIC ONCOLOGY, 2024, 190 : S278 - S279
  • [36] Evaluating ChatGPT's performance in answering common patient questions on cervical cancer
    Do, Anthony
    Li, Andrew
    Smith, Haller
    Chambers, Laura
    Esselen, Kate
    Liang, Margaret
    GYNECOLOGIC ONCOLOGY, 2024, 190 : S376 - S376
  • [37] Performance of ChatGPT in Solving Questions From the Progress Test (Brazilian National Medical Exam): A Potential Artificial Intelligence Tool in Medical Practice
    Alessi, Mateus Rodrigues
    Gomes, Heitor A.
    de Castro, Matheus Lopes
    Okamoto, Cristina Terumy
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (07)
  • [38] The performance of ChatGPT and Bing on a computerized adaptive test of verbal intelligence
    Klein, Balazs
    Kovacs, Kristof
    PLOS ONE, 2024, 19 (07):
  • [39] Google, ChatGPT, questions of omniscience and wisdom
    Hoffman, Frank J.
    Iso, Klairung
    ASIAN PHILOSOPHY, 2025, 35 (01) : 14 - 28
  • [40] Can Students without Prior Knowledge Use ChatGPT to Answer Test Questions? An Empirical Study
    Shoufan, Abdulhadi
    ACM TRANSACTIONS ON COMPUTING EDUCATION, 2023, 23 (04)