Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal

被引:9
|
作者
Noda, Ryunosuke [1 ]
Izaki, Yuto [1 ]
Kitano, Fumiya [1 ]
Komatsu, Jun [1 ]
Ichikawa, Daisuke [1 ]
Shibagaki, Yugo [1 ]
机构
[1] St Marianna Univ, Dept Internal Med, Div Nephrol & Hypertens, Sch Med, 2-16-1 Sugao,Miyamae Ku, Kawasaki, Kanagawa 2168511, Japan
关键词
ChatGPT; GPT-4; Large language models; Artificial intelligence; Nephrology;
D O I
10.1007/s10157-023-02451-w
中图分类号
R5 [内科学]; R69 [泌尿科学(泌尿生殖系疾病)];
学科分类号
1002 ; 100201 ;
摘要
Background Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications. Methods Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents. Results The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents. Conclusions GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.
引用
收藏
页码:465 / 469
页数:5
相关论文
共 50 条
  • [1] Performance of ChatGPT on Nephrology Test Questions
    Miao, Jing
    Thongprayoon, Charat
    Valencia, Oscar A. Garcia
    Krisanapan, Pajaree
    Sheikh, Mohammad S.
    Davis, Paul W.
    Mekraksakit, Poemlarp
    Suarez, Maria Gonzalez
    Craici, Iasmina M.
    Cheungpasitporn, Wisit
    CLINICAL JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2024, 19 (01): : 35 - 43
  • [2] Performance evaluation of ChatGPT 4.0 on cardiovascular questions from the medical knowledge self-assessment program
    Malkani, K.
    Zhang, R.
    Zhao, A.
    Jain, R.
    Collins, G. P.
    Parker, M.
    Maizes, D.
    Zhang, R.
    Kini, V
    EUROPEAN HEART JOURNAL, 2024, 45
  • [3] Self-Assessment Questions
    McCullough, Gary H.
    Rangarathnam, Balaji
    SEMINARS IN SPEECH AND LANGUAGE, 2019, 40 (03) : C1 - C9
  • [4] Assessment of ChatGPT's performance on neurology written board examination questions
    Chen, Tse Chian
    Multala, Evan
    Kearns, Patrick
    Delashaw, Johnny
    Dumont, Aaron
    Maraganore, Demetrius
    Wang, Arthur
    BMJ NEUROLOGY OPEN, 2023, 5 (02)
  • [5] Self-assessment questions - Response
    Mayberry, JF
    POSTGRADUATE MEDICAL JOURNAL, 2000, 76 (893) : 190 - 190
  • [6] Self-assessment exam questions
    Anon
    American Journal of Physical Medicine and Rehabilitation, 2002, 81 (09): : 705 - 706
  • [8] The Performance of ChatGPT on the American Society for Surgery of the Hand Self-Assessment Examination
    Arango, Sebastian D.
    Flynn, Jason C.
    Zeitlin, Jacob
    Wilson, Matthew S.
    Strohl, Adam B.
    Weiss, Lawrence E.
    Weir, Tristan B.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (04)
  • [9] Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions
    Fowler, Thomas
    Pullen, Simon
    Birkett, Liam
    BRITISH JOURNAL OF OPHTHALMOLOGY, 2024, 108 (10) : 1379 - 1383
  • [10] The Effect of an Online Self-Assessment Tool on Nonprofit Board Performance
    Harrison, Yvonne D.
    Murray, Vic
    NONPROFIT AND VOLUNTARY SECTOR QUARTERLY, 2015, 44 (06) : 1129 - 1151