Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation

被引:16
|
作者
Gobira, Mauro [1 ]
Nakayama, Luis Filipe [2 ,3 ]
Moreira, Rodrigo [1 ]
Andrade, Eric [2 ]
Regatieri, Caio Vinicius Saito [2 ]
Belfort Jr, Rubens [2 ]
机构
[1] Vis Inst, Inst Paulista Estudos & Pesquisas Oftalmol, Sao Paulo, SP, Brazil
[2] Univ Fed Sao Paulo, Dept Ophthalmol, Sao Paulo, SP, Brazil
[3] MIT, Inst Med Engn & Sci, Cambridge, MA 02142 USA
来源
关键词
Artificial intelligence; Education; Natural language processing;
D O I
10.1590/1806-9282.20230848
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
OBJECTIVE: The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination. METHODS: A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis. RESULTS: In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups. CONCLUSION: ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] ChatGPT-4 Performance on German Continuing Medical Education-Friend or Foe (Trick or Treat)? Protocol for a Randomized Controlled Trial
    Burisch, Christian
    Bellary, Abhav
    Breuckmann, Frank
    Ehlers, Jan
    Thal, Serge C.
    Sellmann, Timur
    Godde, Daniel
    JMIR RESEARCH PROTOCOLS, 2025, 14
  • [42] Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study
    Noda, Masao
    Ueno, Takayoshi
    Koshu, Ryota
    Takaso, Yuji
    Shimada, Mari Dias
    Saito, Chizu
    Sugimoto, Hisashi
    Fushiki, Hiroaki
    Ito, Makoto
    Nomura, Akihiro
    Yoshizaki, Tomokazu
    JMIR MEDICAL EDUCATION, 2024, 10
  • [43] Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study
    Rojas, Marcos
    Rojas, Marcelo
    Burgess, Valentina
    Toro-Perez, Javier
    Salehi, Shima
    JMIR MEDICAL EDUCATION, 2024, 10
  • [44] Using ChatGPT-4 to Create Structured Medical Notes From Audio Recordings of Physician-Patient Encounters: Comparative Study
    Kernberg, Annessa
    Gold, Jeffrey A.
    Mohan, Vishnu
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [45] Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study
    Torres-Zegarra, Betzy Clariza
    Rios-Garcia, Wagner
    Nana-Cordova, Alvaro Micael
    Arteaga-Cisneros, Karen Fatima
    Chalco, Xiomara Cristina Benavente
    Ordonez, Marina Atena Bustamante
    Rios, Carlos Jesus Gutierrez
    Godoy, Carlos Alberto Ramos
    Quezada, Kristell Luisa Teresa Panta
    Gutierrez-Arratia, Jesus Daniel
    Flores-Cohaila, Javier Alejandro
    JOURNAL OF EDUCATIONAL EVALUATION FOR HEALTH PROFESSIONS, 2023, 20
  • [46] Performance evaluation of ChatGPT 4.0 on cardiovascular questions from the medical knowledge self-assessment program
    Malkani, K.
    Zhang, R.
    Zhao, A.
    Jain, R.
    Collins, G. P.
    Parker, M.
    Maizes, D.
    Zhang, R.
    Kini, V
    EUROPEAN HEART JOURNAL, 2024, 45
  • [48] Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions
    Lee, Yung
    Brar, Karanbir
    Malone, Sarah
    Jin, David
    McKechnie, Tyler
    Jung, James J.
    Kroh, Matthew
    Dang, Jerry T.
    SURGERY FOR OBESITY AND RELATED DISEASES, 2024, 20 (07) : 609 - 613
  • [49] ChatGPT (GPT-4) passed the Japanese National License Examination for Pharmacists in 2022, answering all items including those with diagrams: a descriptive study
    Sato, Hiroyasu
    Ogasawara, Katsuhiko
    JOURNAL OF EDUCATIONAL EVALUATION FOR HEALTH PROFESSIONS, 2024, 21
  • [50] Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses
    Zong, Hui
    Li, Jiakun
    Wu, Erman
    Wu, Rongrong
    Lu, Junyu
    Shen, Bairong
    BMC MEDICAL EDUCATION, 2024, 24 (01)