New Artificial Intelligence ChatGPT Performs Poorly on the 2022 Self-assessment Study Program for Urology

被引：41

作者：

Huynh, Linda My ^{[1
]}

Bonebrake, Benjamin T. ^{[2
]}

Schultis, Kaitlyn ^{[2
]}

Quach, Alan ^{[3
]}

Deibert, Christopher M. ^{[3
,4
]}

机构：

[1] Univ Nebraska Med Ctr, Omaha, NE USA

[2] Univ Nebraska Med Ctr, Coll Med, Omaha, NE USA

[3] Univ Nebraska Med Ctr, Div Urol, Omaha, NE USA

[4] Univ Nebraska Med Ctr, Dept Surg, Div Urol, 987521 Nebraska Med Ctr, Omaha, NE 68198 USA

来源：

UROLOGY PRACTICE | 2023年 / 10卷 / 04期

关键词：

artificial intelligence; medical informatics applications; urology;

D O I：

10.1097/UPJ.0000000000000406

中图分类号：

R5 [内科学]; R69 [泌尿科学（泌尿生殖系疾病）];

学科分类号：

1002 ; 100201 ;

摘要：

Introduction:Large language models have demonstrated impressive capabilities, but application to medicine remains unclear. We seek to evaluate the use of ChatGPT on the American Urological Association Self-assessment Study Program as an educational adjunct for urology trainees and practicing physicians.Methods:One hundred fifty questions from the 2022 Self-assessment Study Program exam were screened, and those containing visual assets (n=15) were removed. The remaining items were encoded as open ended or multiple choice. ChatGPT's output was coded as correct, incorrect, or indeterminate; if indeterminate, responses were regenerated up to 2 times. Concordance, quality, and accuracy were ascertained by 3 independent researchers and reviewed by 2 physician adjudicators. A new session was started for each entry to avoid crossover learning.Results:ChatGPT was correct on 36/135 (26.7%) open-ended and 38/135 (28.2%) multiple-choice questions. Indeterminate responses were generated in 40 (29.6%) and 4 (3.0%), respectively. Of the correct responses, 24/36 (66.7%) and 36/38 (94.7%) were on initial output, 8 (22.2%) and 1 (2.6%) on second output, and 4 (11.1%) and 1 (2.6%) on final output, respectively. Although regeneration decreased indeterminate responses, proportion of correct responses did not increase. For open-ended and multiple-choice questions, ChatGPT provided consistent justifications for incorrect answers and remained concordant between correct and incorrect answers.Conclusions:ChatGPT previously demonstrated promise on medical licensing exams; however, application to the 2022 Self-assessment Study Program was not demonstrated. Performance improved with multiple-choice over open-ended questions. More importantly were the persistent justifications for incorrect responses-left unchecked, utilization of ChatGPT in medicine may facilitate medical misinformation.

引用

页码：408 / +

页数：8

共 45 条

[1] New Artificial Intelligence ChatGPT Performs Poorly on the 2022 Self-assessment Study Program for Urology Editorial Commentary
Jones, J. Stephen
UROLOGY PRACTICE, 2023, 10 (04) : 416 - 416
[2] Google Bard Artificial Intelligence vs the 2022 Self-Assessment Study Program for Urology
Huynh, Linda My
Bonebrake, Benjamin T.
Schultis, Kaitlyn
Quach, Alan
Deibert, Christopher M.
UROLOGY PRACTICE, 2023, 10 (06)
[3] Artificial Intelligence on the Exam Table: ChatGPT's Advancement in Urology Self-assessment
Cadiente, Angelo
Chen, Jamie
Nguyen, Jennifer
Sadeghi-Nejad, Hossein
Billah, Mubashir
UROLOGY PRACTICE, 2023, 10 (06) : 521 - 523
[4] ChatGPT Performance on the American Urological Association Self-assessment Study Program and the Potential Influence of Artificial Intelligence in Urologic Training
Deebel, Nicholas A.
Terlecki, Ryan
UROLOGY, 2023, 177 : 29 - 33
[5] ChatGPT Performance on the American Urological Association Self-assessment Study Program and the Potential Influence of Artificial Intelligence in Urologic Training EDITORIAL COMMENT
Griebling, Tomas
Kaplan, Damara
UROLOGY, 2023, 177 : 33 - 33
[6] RETRACTION: Artificial intelligence on the exam table: ChatGPT's advancement in urology self-assessment (Retraction of Vol 10, Pg 521, 2023)
Cadiente, A.
Chen, J.
Nguyen, J.
Sadeghi-Nejad, H.
Billah, M.
UROLOGY PRACTICE, 2024, 11 (02) : 447 - 447
[7] Artificial Intelligence Literacy Competencies for Teachers Through Self-Assessment Tools
Tenberga, Ieva
Daniela, Linda
SUSTAINABILITY, 2024, 16 (23)
[8] A Self-assessment Tool to Encourage the Uptake of Artificial Intelligence in Digital Workspaces
Abu Naim, Belal
Ghafourian, Yasin
Tauber, Markus
Lindner, Fabian
Schmittner, Christoph
Schoitsch, Erwin
Schneider, Germar
Kattan, Olga
Reiner, Gerald
Ryabokon, Anna
Flamigni, Francesca
Karathanasopoulou, Konstantina
Dimitrakopoulos, George
PROCEEDINGS OF 2024 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, NOMS 2024, 2024,
[9] Performance evaluation of ChatGPT 4.0 on cardiovascular questions from the medical knowledge self-assessment program
Malkani, K.
Zhang, R.
Zhao, A.
Jain, R.
Collins, G. P.
Parker, M.
Maizes, D.
Zhang, R.
Kini, V
EUROPEAN HEART JOURNAL, 2024, 45
[10] <bold>Self-assessment of university students on the application and potential of Artificial Intelligence for their formation</bold>
Aguilar, Nivia T. Alvarez
Cubero, Arnulfo Trevino
Elizondo, Jaime Arturo Castillo
ATENAS, 2024, (62):

← 1 2 3 4 5 →