Performance of ChatGPT on Nephrology Test Questions

被引：39

作者：

Miao, Jing ^{[1
]}

Thongprayoon, Charat ^{[1
]}

Valencia, Oscar A. Garcia ^{[1
]}

Krisanapan, Pajaree ^{[1
]}

Sheikh, Mohammad S. ^{[1
]}

Davis, Paul W. ^{[1
]}

Mekraksakit, Poemlarp ^{[1
]}

Suarez, Maria Gonzalez ^{[1
]}

Craici, Iasmina M. ^{[1
]}

Cheungpasitporn, Wisit ^{[1
]}

机构：

[1] Mayo Clin, Dept Med, Div Nephrol & Hypertens, Rochester, MN 55902 USA

来源：

CLINICAL JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY | 2024年 / 19卷 / 01期

关键词：

clinical nephrology; kidney disease; medical education;

D O I：

10.2215/CJN.0000000000000330

中图分类号：

R5 [内科学]; R69 [泌尿科学（泌尿生殖系疾病）];

学科分类号：

1002 ; 100201 ;

摘要：

Background ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions.Methods Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance.Results A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) (P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 (P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%).Conclusions ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.

引用

页码：35 / 43

页数：9

共 50 条

[31] Evaluating the performance of the language model ChatGPT in responding to common questions of people with epilepsy
Wu, Yuxin
Zhang, Zaiyu
Dong, Xinyu
Hong, Siqi
Hu, Yue
Liang, Ping
Li, Lusheng
Zou, Bin
Wu, Xuanxuan
Wang, Difei
Chen, Hui
Qiu, Hanli
Tang, Haotian
Kang, Kaiyi
Li, Qinling
Zhai, Xuan
EPILEPSY & BEHAVIOR, 2024, 151
[32] Performance of ChatGPT on American Board of Surgery In-Training Examination Preparation Questions
Tran, Catherine G.
Chang, Jeremy
Sherman, Scott K.
De Andrade, James P.
JOURNAL OF SURGICAL RESEARCH, 2024, 299 : 329 - 335
[33] Performance of Large Language Models ChatGPT and Gemini on Workplace Management Questions in Radiology
Leutz-Schmidt, Patricia
Palm, Viktoria
Mathy, Rene Michael
Groezinger, Martin
Kauczor, Hans-Ulrich
Jang, Hyungseok
Sedaghat, Sam
DIAGNOSTICS, 2025, 15 (04)
[34] Performance of ChatGPT on Specialty Certificate Examination in Dermatology multiple-choice questions
Passby, Lauren
Jenko, Nathan
Wernham, Aaron
CLINICAL AND EXPERIMENTAL DERMATOLOGY, 2023, 49 (07) : 722 - 727
[35] Assessing ChatGPT's performance in answering common questions regarding endometrial cancer
Kuo, Iris
Falk, Lauren
Ladanyi, Andras
Dewdney, Summer
GYNECOLOGIC ONCOLOGY, 2024, 190 : S278 - S279
[36] Evaluating ChatGPT's performance in answering common patient questions on cervical cancer
Do, Anthony
Li, Andrew
Smith, Haller
Chambers, Laura
Esselen, Kate
Liang, Margaret
GYNECOLOGIC ONCOLOGY, 2024, 190 : S376 - S376
[37] Performance of ChatGPT in Solving Questions From the Progress Test (Brazilian National Medical Exam): A Potential Artificial Intelligence Tool in Medical Practice
Alessi, Mateus Rodrigues
Gomes, Heitor A.
de Castro, Matheus Lopes
Okamoto, Cristina Terumy
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (07)
[38] The performance of ChatGPT and Bing on a computerized adaptive test of verbal intelligence
Klein, Balazs
Kovacs, Kristof
PLOS ONE, 2024, 19 (07):
[39] Google, ChatGPT, questions of omniscience and wisdom
Hoffman, Frank J.
Iso, Klairung
ASIAN PHILOSOPHY, 2025, 35 (01) : 14 - 28
[40] Can Students without Prior Knowledge Use ChatGPT to Answer Test Questions? An Empirical Study
Shoufan, Abdulhadi
ACM TRANSACTIONS ON COMPUTING EDUCATION, 2023, 23 (04)

← 1 2 3 4 5 →