Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation

被引:16
|
作者
Gobira, Mauro [1 ]
Nakayama, Luis Filipe [2 ,3 ]
Moreira, Rodrigo [1 ]
Andrade, Eric [2 ]
Regatieri, Caio Vinicius Saito [2 ]
Belfort Jr, Rubens [2 ]
机构
[1] Vis Inst, Inst Paulista Estudos & Pesquisas Oftalmol, Sao Paulo, SP, Brazil
[2] Univ Fed Sao Paulo, Dept Ophthalmol, Sao Paulo, SP, Brazil
[3] MIT, Inst Med Engn & Sci, Cambridge, MA 02142 USA
来源
关键词
Artificial intelligence; Education; Natural language processing;
D O I
10.1590/1806-9282.20230848
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
OBJECTIVE: The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination. METHODS: A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis. RESULTS: In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups. CONCLUSION: ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
    Wiwanitkit, Somsri
    Wiwanitkit, Viroj
    REVISTA DA ASSOCIACAO MEDICA BRASILEIRA, 2024, 70 (03):
  • [2] Evaluation of ChatGPT-4 Performance in Answering Patients' Questions About the Management of Type 2 Diabetes
    Gokbulut, Puren
    Kuskonmaz, Serife Mehlika
    Onder, Cagatay Emir
    Taskaldiran, Isilay
    Koc, Gonul
    MEDICAL BULLETIN OF SISLI ETFAL HOSPITAL, 2024, 58 (04): : 483 - 490
  • [3] AI IN HEPATOLOGY: A COMPARATIVE ANALYSIS OF CHATGPT-4, BING, AND BARD AT ANSWERING CLINICAL QUESTIONS
    Anvari, Sama
    Lee, Yung
    Jin, David S.
    Malone, Sarah
    Collins, Matthew
    GASTROENTEROLOGY, 2024, 166 (05) : S888 - S888
  • [4] Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study
    Wang, Ying-Mei
    Shen, Hung-Wei
    Chen, Tzeng-Ji
    Chiang, Shu-Chiung
    Lin, Ting-Guan
    JMIR MEDICAL EDUCATION, 2025, 11
  • [5] Augmenting Medical Education: An Evaluation of GPT-4 and ChatGPT in Answering Rheumatology Questions from the Spanish Medical Licensing Examination
    Madrid Garcia, Alfredo
    Rosales, Zulema
    Freites, Dalifer
    Perez Sancristobal, Ines
    Fernandez, Benjamin
    Rodriguez Rodriguez, Luis
    ARTHRITIS & RHEUMATOLOGY, 2023, 75 : 4095 - 4097
  • [6] Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions
    Sarangi, Pradosh Kumar
    Datta, Suvrankar
    Panda, Braja Behari
    Panda, Swaha
    Mondal, Himel
    INDIAN JOURNAL OF RADIOLOGY AND IMAGING, 2024,
  • [7] Artificial intelligence in hepatology: a comparative analysis of ChatGPT-4, Bing, and Bard at answering clinical questions
    Anvari, Sama
    Lee, Yung
    Jin, David Shiqiang
    Malone, Sarah
    Collins, Matthew
    JOURNAL OF THE CANADIAN ASSOCIATION OF GASTROENTEROLOGY, 2025,
  • [8] ChatGPT-4 Omni's superiority in answering multiple-choice oral radiology questions
    Tassoker, Melek
    BMC ORAL HEALTH, 2025, 25 (01):
  • [9] Exploring the proficiency of ChatGPT-4: An evaluation of its performance in the Taiwan advanced medical licensing examination
    Lin, Shih-Yi
    Chan, Pak Ki
    Hsu, Wu-Huei
    Kao, Chia-Hung
    DIGITAL HEALTH, 2024, 10
  • [10] This too shall pass: the performance of ChatGPT-3.5, ChatGPT-4 and New Bing in an Australian medical licensing examination
    Kleinig, Oliver
    Gao, Christina
    Bacchi, Stephen
    MEDICAL JOURNAL OF AUSTRALIA, 2023, 219 (05)