Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation

被引：16

作者：

Gobira, Mauro ^{[1
]}

Nakayama, Luis Filipe ^{[2
,3
]}

Moreira, Rodrigo ^{[1
]}

Andrade, Eric ^{[2
]}

Regatieri, Caio Vinicius Saito ^{[2
]}

Belfort Jr, Rubens ^{[2
]}

机构：

[1] Vis Inst, Inst Paulista Estudos & Pesquisas Oftalmol, Sao Paulo, SP, Brazil

[2] Univ Fed Sao Paulo, Dept Ophthalmol, Sao Paulo, SP, Brazil

[3] MIT, Inst Med Engn & Sci, Cambridge, MA 02142 USA

来源：

REVISTA DA ASSOCIACAO MEDICA BRASILEIRA | 2023年 / 69卷 / 10期

关键词：

Artificial intelligence; Education; Natural language processing;

D O I：

10.1590/1806-9282.20230848

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

OBJECTIVE: The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination. METHODS: A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis. RESULTS: In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups. CONCLUSION: ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions.

引用

页数：5

共 50 条

[41] ChatGPT-4 Performance on German Continuing Medical Education-Friend or Foe (Trick or Treat)? Protocol for a Randomized Controlled Trial
Burisch, Christian
Bellary, Abhav
Breuckmann, Frank
Ehlers, Jan
Thal, Serge C.
Sellmann, Timur
Godde, Daniel
JMIR RESEARCH PROTOCOLS, 2025, 14
[42] Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study
Noda, Masao
Ueno, Takayoshi
Koshu, Ryota
Takaso, Yuji
Shimada, Mari Dias
Saito, Chizu
Sugimoto, Hisashi
Fushiki, Hiroaki
Ito, Makoto
Nomura, Akihiro
Yoshizaki, Tomokazu
JMIR MEDICAL EDUCATION, 2024, 10
[43] Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study
Rojas, Marcos
Rojas, Marcelo
Burgess, Valentina
Toro-Perez, Javier
Salehi, Shima
JMIR MEDICAL EDUCATION, 2024, 10
[44] Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study
Kernberg, Annessa
Gold, Jeffrey A.
Mohan, Vishnu
JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[45] Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study
Torres-Zegarra, Betzy Clariza
Rios-Garcia, Wagner
Nana-Cordova, Alvaro Micael
Arteaga-Cisneros, Karen Fatima
Chalco, Xiomara Cristina Benavente
Ordonez, Marina Atena Bustamante
Rios, Carlos Jesus Gutierrez
Godoy, Carlos Alberto Ramos
Quezada, Kristell Luisa Teresa Panta
Gutierrez-Arratia, Jesus Daniel
Flores-Cohaila, Javier Alejandro
JOURNAL OF EDUCATIONAL EVALUATION FOR HEALTH PROFESSIONS, 2023, 20
[46] Performance evaluation of ChatGPT 4.0 on cardiovascular questions from the medical knowledge self-assessment program
Malkani, K.
Zhang, R.
Zhao, A.
Jain, R.
Collins, G. P.
Parker, M.
Maizes, D.
Zhang, R.
Kini, V
EUROPEAN HEART JOURNAL, 2024, 45
[47] Performance on the National Board of Medical Examiners Clinical Neurology Examination in medical students who complete optional 'bonus' essay questions
Young, John A.
NEUROLOGY, 2007, 68 (12) : A75 - A76
[48] Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions
Lee, Yung
Brar, Karanbir
Malone, Sarah
Jin, David
McKechnie, Tyler
Jung, James J.
Kroh, Matthew
Dang, Jerry T.
SURGERY FOR OBESITY AND RELATED DISEASES, 2024, 20 (07) : 609 - 613
[49] ChatGPT (GPT-4) passed the Japanese National License Examination for Pharmacists in 2022, answering all items including those with diagrams: a descriptive study
Sato, Hiroyasu
Ogasawara, Katsuhiko
JOURNAL OF EDUCATIONAL EVALUATION FOR HEALTH PROFESSIONS, 2024, 21
[50] Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses
Zong, Hui
Li, Jiakun
Wu, Erman
Wu, Rongrong
Lu, Junyu
Shen, Bairong
BMC MEDICAL EDUCATION, 2024, 24 (01)

← 1 2 3 4 5 →