Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal

被引:9
|
作者
Noda, Ryunosuke [1 ]
Izaki, Yuto [1 ]
Kitano, Fumiya [1 ]
Komatsu, Jun [1 ]
Ichikawa, Daisuke [1 ]
Shibagaki, Yugo [1 ]
机构
[1] St Marianna Univ, Dept Internal Med, Div Nephrol & Hypertens, Sch Med, 2-16-1 Sugao,Miyamae Ku, Kawasaki, Kanagawa 2168511, Japan
关键词
ChatGPT; GPT-4; Large language models; Artificial intelligence; Nephrology;
D O I
10.1007/s10157-023-02451-w
中图分类号
R5 [内科学]; R69 [泌尿科学(泌尿生殖系疾病)];
学科分类号
1002 ; 100201 ;
摘要
Background Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications. Methods Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents. Results The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents. Conclusions GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.
引用
收藏
页码:465 / 469
页数:5
相关论文
共 50 条
  • [31] SELF-ASSESSMENT AS A REGULATOR OF COGNITIVE PERFORMANCE
    RUISEL, I
    CESKOSLOVENSKA PSYCHOLOGIE, 1983, 27 (03): : 255 - 261
  • [32] Performance evaluation of ChatGPT, GPT-4, and Bard on the official board examination of the Japan Radiology Society
    Toyama, Yoshitaka
    Harigai, Ayaka
    Abe, Mirei
    Nagano, Mitsutoshi
    Kawabata, Masahiro
    Seki, Yasuhiro
    Takase, Kei
    JAPANESE JOURNAL OF RADIOLOGY, 2023, 42 (2) : 201 - 207
  • [33] Performance evaluation of ChatGPT, GPT-4, and Bard on the official board examination of the Japan Radiology Society
    Yoshitaka Toyama
    Ayaka Harigai
    Mirei Abe
    Mitsutoshi Nagano
    Masahiro Kawabata
    Yasuhiro Seki
    Kei Takase
    Japanese Journal of Radiology, 2024, 42 : 201 - 207
  • [34] Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard
    Cheong, Ryan Chin Taw
    Pang, Kenny Peter
    Unadkat, Samit
    Mcneillis, Venkata
    Williamson, Andrew
    Joseph, Jonathan
    Randhawa, Premjit
    Andrews, Peter
    Paleri, Vinidh
    EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, 281 (04) : 2137 - 2143
  • [35] Safety performance self-assessment model
    Jenni-Maarit, K
    Samuli, R
    Mika, L
    Riitta, S
    Markku, M
    ERGONOMICS AND SAFETY FOR GLOBAL BUSINESS QUALITY AND PRODUCTIVITY, 2000, : 327 - 330
  • [36] SELF-ASSESSMENT AND TASK-PERFORMANCE
    TROPE, Y
    JOURNAL OF EXPERIMENTAL SOCIAL PSYCHOLOGY, 1982, 18 (02) : 201 - 215
  • [37] Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard
    Ryan Chin Taw Cheong
    Kenny Peter Pang
    Samit Unadkat
    Venkata Mcneillis
    Andrew Williamson
    Jonathan Joseph
    Premjit Randhawa
    Peter Andrews
    Vinidh Paleri
    European Archives of Oto-Rhino-Laryngology, 2024, 281 : 2137 - 2143
  • [38] Performance of "Bard", Google's Artificial Intelligence Chatbot, on Ophthalmology Board Exam Practice Questions
    Botross, Monica
    Mohammadi, Seyed Omid
    Montgomery, Kendall
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [39] Clinicians' self-assessment - Questions and answers in substance abuse treatment
    Hirsch, R
    JOURNAL OF SUBSTANCE ABUSE TREATMENT, 1999, 17 (04) : 353 - 354
  • [40] Urinary incontinence: Self-assessment multiple-choice questions
    Cust, MP
    BEST PRACTICE & RESEARCH IN CLINICAL OBSTETRICS & GYNAECOLOGY, 2000, 14 (02): : A1 - A12