New Artificial Intelligence ChatGPT Performs Poorly on the 2022 Self-assessment Study Program for Urology

被引:41
|
作者
Huynh, Linda My [1 ]
Bonebrake, Benjamin T. [2 ]
Schultis, Kaitlyn [2 ]
Quach, Alan [3 ]
Deibert, Christopher M. [3 ,4 ]
机构
[1] Univ Nebraska Med Ctr, Omaha, NE USA
[2] Univ Nebraska Med Ctr, Coll Med, Omaha, NE USA
[3] Univ Nebraska Med Ctr, Div Urol, Omaha, NE USA
[4] Univ Nebraska Med Ctr, Dept Surg, Div Urol, 987521 Nebraska Med Ctr, Omaha, NE 68198 USA
关键词
artificial intelligence; medical informatics applications; urology;
D O I
10.1097/UPJ.0000000000000406
中图分类号
R5 [内科学]; R69 [泌尿科学(泌尿生殖系疾病)];
学科分类号
1002 ; 100201 ;
摘要
Introduction:Large language models have demonstrated impressive capabilities, but application to medicine remains unclear. We seek to evaluate the use of ChatGPT on the American Urological Association Self-assessment Study Program as an educational adjunct for urology trainees and practicing physicians.Methods:One hundred fifty questions from the 2022 Self-assessment Study Program exam were screened, and those containing visual assets (n=15) were removed. The remaining items were encoded as open ended or multiple choice. ChatGPT's output was coded as correct, incorrect, or indeterminate; if indeterminate, responses were regenerated up to 2 times. Concordance, quality, and accuracy were ascertained by 3 independent researchers and reviewed by 2 physician adjudicators. A new session was started for each entry to avoid crossover learning.Results:ChatGPT was correct on 36/135 (26.7%) open-ended and 38/135 (28.2%) multiple-choice questions. Indeterminate responses were generated in 40 (29.6%) and 4 (3.0%), respectively. Of the correct responses, 24/36 (66.7%) and 36/38 (94.7%) were on initial output, 8 (22.2%) and 1 (2.6%) on second output, and 4 (11.1%) and 1 (2.6%) on final output, respectively. Although regeneration decreased indeterminate responses, proportion of correct responses did not increase. For open-ended and multiple-choice questions, ChatGPT provided consistent justifications for incorrect answers and remained concordant between correct and incorrect answers.Conclusions:ChatGPT previously demonstrated promise on medical licensing exams; however, application to the 2022 Self-assessment Study Program was not demonstrated. Performance improved with multiple-choice over open-ended questions. More importantly were the persistent justifications for incorrect responses-left unchecked, utilization of ChatGPT in medicine may facilitate medical misinformation.
引用
收藏
页码:408 / +
页数:8
相关论文
共 45 条
  • [2] Google Bard Artificial Intelligence vs the 2022 Self-Assessment Study Program for Urology
    Huynh, Linda My
    Bonebrake, Benjamin T.
    Schultis, Kaitlyn
    Quach, Alan
    Deibert, Christopher M.
    UROLOGY PRACTICE, 2023, 10 (06)
  • [3] Artificial Intelligence on the Exam Table: ChatGPT's Advancement in Urology Self-assessment
    Cadiente, Angelo
    Chen, Jamie
    Nguyen, Jennifer
    Sadeghi-Nejad, Hossein
    Billah, Mubashir
    UROLOGY PRACTICE, 2023, 10 (06) : 521 - 523
  • [4] ChatGPT Performance on the American Urological Association Self-assessment Study Program and the Potential Influence of Artificial Intelligence in Urologic Training
    Deebel, Nicholas A.
    Terlecki, Ryan
    UROLOGY, 2023, 177 : 29 - 33
  • [5] ChatGPT Performance on the American Urological Association Self-assessment Study Program and the Potential Influence of Artificial Intelligence in Urologic Training EDITORIAL COMMENT
    Griebling, Tomas
    Kaplan, Damara
    UROLOGY, 2023, 177 : 33 - 33
  • [6] RETRACTION: Artificial intelligence on the exam table: ChatGPT's advancement in urology self-assessment (Retraction of Vol 10, Pg 521, 2023)
    Cadiente, A.
    Chen, J.
    Nguyen, J.
    Sadeghi-Nejad, H.
    Billah, M.
    UROLOGY PRACTICE, 2024, 11 (02) : 447 - 447
  • [7] Artificial Intelligence Literacy Competencies for Teachers Through Self-Assessment Tools
    Tenberga, Ieva
    Daniela, Linda
    SUSTAINABILITY, 2024, 16 (23)
  • [8] A Self-assessment Tool to Encourage the Uptake of Artificial Intelligence in Digital Workspaces
    Abu Naim, Belal
    Ghafourian, Yasin
    Tauber, Markus
    Lindner, Fabian
    Schmittner, Christoph
    Schoitsch, Erwin
    Schneider, Germar
    Kattan, Olga
    Reiner, Gerald
    Ryabokon, Anna
    Flamigni, Francesca
    Karathanasopoulou, Konstantina
    Dimitrakopoulos, George
    PROCEEDINGS OF 2024 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, NOMS 2024, 2024,
  • [9] Performance evaluation of ChatGPT 4.0 on cardiovascular questions from the medical knowledge self-assessment program
    Malkani, K.
    Zhang, R.
    Zhao, A.
    Jain, R.
    Collins, G. P.
    Parker, M.
    Maizes, D.
    Zhang, R.
    Kini, V
    EUROPEAN HEART JOURNAL, 2024, 45
  • [10] <bold>Self-assessment of university students on the application and potential of Artificial Intelligence for their formation</bold>
    Aguilar, Nivia T. Alvarez
    Cubero, Arnulfo Trevino
    Elizondo, Jaime Arturo Castillo
    ATENAS, 2024, (62):