Performance of ChatGPT on Nephrology Test Questions

被引:39
|
作者
Miao, Jing [1 ]
Thongprayoon, Charat [1 ]
Valencia, Oscar A. Garcia [1 ]
Krisanapan, Pajaree [1 ]
Sheikh, Mohammad S. [1 ]
Davis, Paul W. [1 ]
Mekraksakit, Poemlarp [1 ]
Suarez, Maria Gonzalez [1 ]
Craici, Iasmina M. [1 ]
Cheungpasitporn, Wisit [1 ]
机构
[1] Mayo Clin, Dept Med, Div Nephrol & Hypertens, Rochester, MN 55902 USA
关键词
clinical nephrology; kidney disease; medical education;
D O I
10.2215/CJN.0000000000000330
中图分类号
R5 [内科学]; R69 [泌尿科学(泌尿生殖系疾病)];
学科分类号
1002 ; 100201 ;
摘要
Background ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions.Methods Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance.Results A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) (P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 (P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%).Conclusions ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.
引用
收藏
页码:35 / 43
页数:9
相关论文
共 50 条
  • [1] Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal
    Noda, Ryunosuke
    Izaki, Yuto
    Kitano, Fumiya
    Komatsu, Jun
    Ichikawa, Daisuke
    Shibagaki, Yugo
    CLINICAL AND EXPERIMENTAL NEPHROLOGY, 2024, 28 (05) : 465 - 469
  • [2] INTERVENTIONAL NEPHROLOGY ASSESSMENT QUESTIONS: A PERFORMANCE EVALUATION AND COMPARATIVE ANALYSIS OF CHATGPT-3.5 AND GPT-4
    Sheikh, Mohammad
    Qureshi, Fawad
    Thongprayoon, Charat
    Suarez, Lourdes Gonzalez
    Craici, Lasmina
    Cheungpasitporn, Visit
    AMERICAN JOURNAL OF KIDNEY DISEASES, 2024, 83 (04) : S100 - S101
  • [3] Enhancing Large Language Models (LLM) Performance in Nephrology through Prompt Engineering: A Comparative Analysis of ChatGPT-4 Responses in Answering AKI and Critical Care Nephrology Questions
    Sheikh, M. Salman
    Thongprayoon, Charat
    Qureshi, Fawad
    Abdelgadir, Yasir
    Craici, Iasmina
    Kashani, Kianoush
    Cheungpasitporn, Wisit
    JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2024, 35 (10):
  • [4] Evaluating the performance of ChatGPT in answering questions related to urolithiasis
    Hakan Cakir
    Ufuk Caglar
    Oguzhan Yildiz
    Arda Meric
    Ali Ayranci
    Faruk Ozgor
    International Urology and Nephrology, 2024, 56 : 17 - 21
  • [5] Low Performance of ChatGPT on Echocardiography Board Review Questions
    Kangiszer, Gyula
    Mahtani, Arun Umesh
    Pintea, Mark
    Jacobs, Charlotte
    Sragovicz, Hannah
    Nguyen, Tai
    Yeturu, Sahithi
    Lieberman, Madison
    Waldman, Carly
    Bhavnani, Sanjeev P.
    Hermel, Melody
    JACC-CARDIOVASCULAR IMAGING, 2024, 17 (03) : 330 - 332
  • [6] Evaluating the performance of ChatGPT in answering questions related to urolithiasis
    Cakir, Hakan
    Caglar, Ufuk
    Yildiz, Oguzhan
    Meric, Arda
    Ayranci, Ali
    Ozgor, Faruk
    INTERNATIONAL UROLOGY AND NEPHROLOGY, 2024, 56 (01) : 17 - 21
  • [7] Performance of ChatGPT on basic healthcare leadership and management questions
    Leutz-Schmidt, Patricia
    Groezinger, Martin
    Kauczor, Hans-Ulrich
    Jang, Hyungseok
    Sedaghat, Sam
    HEALTH AND TECHNOLOGY, 2024, 14 (06) : 1161 - 1166
  • [8] Performance of ChatGPT on Solving Orthopedic Board-Style Questions: A Comparative Analysis of ChatGPT 3.5 and ChatGPT 4
    Kim, Sung Eun
    Lee, Ji Han
    Choi, Byung Sun
    Han, Hyuk-Soo
    Lee, Myung Chul
    Ro, Du Hyun
    CLINICS IN ORTHOPEDIC SURGERY, 2024, 16 (04) : 669 - 673
  • [9] Unanswered questions in nephrology
    Al-Awqati, QA
    KIDNEY INTERNATIONAL, 2006, 69 (04) : 637 - 638
  • [10] Evaluating the performance of ChatGPT in answering questions related to pediatric urology
    Caglar, Ufuk
    Yildiz, Oguzhan
    Meric, Arda
    Ayranci, Ali
    Gelmis, Mucahit
    Sarilar, Omer
    Ozgor, Faruk
    JOURNAL OF PEDIATRIC UROLOGY, 2024, 20 (01) : 26.e1 - 26.e5