Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation

被引：16

作者：

Gobira, Mauro ^{[1
]}

Nakayama, Luis Filipe ^{[2
,3
]}

Moreira, Rodrigo ^{[1
]}

Andrade, Eric ^{[2
]}

Regatieri, Caio Vinicius Saito ^{[2
]}

Belfort Jr, Rubens ^{[2
]}

机构：

[1] Vis Inst, Inst Paulista Estudos & Pesquisas Oftalmol, Sao Paulo, SP, Brazil

[2] Univ Fed Sao Paulo, Dept Ophthalmol, Sao Paulo, SP, Brazil

[3] MIT, Inst Med Engn & Sci, Cambridge, MA 02142 USA

来源：

REVISTA DA ASSOCIACAO MEDICA BRASILEIRA | 2023年 / 69卷 / 10期

关键词：

Artificial intelligence; Education; Natural language processing;

D O I：

10.1590/1806-9282.20230848

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

OBJECTIVE: The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination. METHODS: A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis. RESULTS: In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups. CONCLUSION: ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions.

引用

页数：5

共 50 条

[1] Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
Wiwanitkit, Somsri
Wiwanitkit, Viroj
REVISTA DA ASSOCIACAO MEDICA BRASILEIRA, 2024, 70 (03):
[2] Evaluation of ChatGPT-4 Performance in Answering Patients' Questions About the Management of Type 2 Diabetes
Gokbulut, Puren
Kuskonmaz, Serife Mehlika
Onder, Cagatay Emir
Taskaldiran, Isilay
Koc, Gonul
MEDICAL BULLETIN OF SISLI ETFAL HOSPITAL, 2024, 58 (04): : 483 - 490
[3] AI IN HEPATOLOGY: A COMPARATIVE ANALYSIS OF CHATGPT-4, BING, AND BARD AT ANSWERING CLINICAL QUESTIONS
Anvari, Sama
Lee, Yung
Jin, David S.
Malone, Sarah
Collins, Matthew
GASTROENTEROLOGY, 2024, 166 (05) : S888 - S888
[4] Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study
Wang, Ying-Mei
Shen, Hung-Wei
Chen, Tzeng-Ji
Chiang, Shu-Chiung
Lin, Ting-Guan
JMIR MEDICAL EDUCATION, 2025, 11
[5] Augmenting Medical Education: An Evaluation of GPT-4 and ChatGPT in Answering Rheumatology Questions from the Spanish Medical Licensing Examination
Madrid Garcia, Alfredo
Rosales, Zulema
Freites, Dalifer
Perez Sancristobal, Ines
Fernandez, Benjamin
Rodriguez Rodriguez, Luis
ARTHRITIS & RHEUMATOLOGY, 2023, 75 : 4095 - 4097
[6] Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions
Sarangi, Pradosh Kumar
Datta, Suvrankar
Panda, Braja Behari
Panda, Swaha
Mondal, Himel
INDIAN JOURNAL OF RADIOLOGY AND IMAGING, 2024,
[7] Artificial intelligence in hepatology: a comparative analysis of ChatGPT-4, Bing, and Bard at answering clinical questions
Anvari, Sama
Lee, Yung
Jin, David Shiqiang
Malone, Sarah
Collins, Matthew
JOURNAL OF THE CANADIAN ASSOCIATION OF GASTROENTEROLOGY, 2025,
[8] ChatGPT-4 Omni's superiority in answering multiple-choice oral radiology questions
Tassoker, Melek
BMC ORAL HEALTH, 2025, 25 (01):
[9] Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination
Lin, Shih-Yi
Chan, Pak Ki
Hsu, Wu-Huei
Kao, Chia-Hung
DIGITAL HEALTH, 2024, 10
[10] This too shall pass: the performance of ChatGPT-3.5, ChatGPT-4 and New Bing in an Australian medical licensing examination
Kleinig, Oliver
Gao, Christina
Bacchi, Stephen
MEDICAL JOURNAL OF AUSTRALIA, 2023, 219 (05)

← 1 2 3 4 5 →