Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation

被引:16
|
作者
Gobira, Mauro [1 ]
Nakayama, Luis Filipe [2 ,3 ]
Moreira, Rodrigo [1 ]
Andrade, Eric [2 ]
Regatieri, Caio Vinicius Saito [2 ]
Belfort Jr, Rubens [2 ]
机构
[1] Vis Inst, Inst Paulista Estudos & Pesquisas Oftalmol, Sao Paulo, SP, Brazil
[2] Univ Fed Sao Paulo, Dept Ophthalmol, Sao Paulo, SP, Brazil
[3] MIT, Inst Med Engn & Sci, Cambridge, MA 02142 USA
来源
关键词
Artificial intelligence; Education; Natural language processing;
D O I
10.1590/1806-9282.20230848
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
OBJECTIVE: The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination. METHODS: A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis. RESULTS: In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups. CONCLUSION: ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] ChatGPT-4's capability in addressing multiple-choice questions within the primary examination of the Australian and New Zealand College of Anaesthetists
    Cai, Steven C.
    Tung, Alpha M. S.
    ANAESTHESIA AND INTENSIVE CARE, 2025, 53 (01) : 70 - 74
  • [32] The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses
    Sallam, Malik
    Al-Mahzoum, Kholoud
    Almutawaa, Rawan Ahmad
    Alhashash, Jasmen Ahmad
    Dashti, Retaj Abdullah
    Alsafy, Danah Raed
    Almutairi, Reem Abdullah
    Barakat, Muna
    BMC RESEARCH NOTES, 2024, 17 (01)
  • [33] Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions
    Patel, Evan A.
    Fleischer, Lindsay
    Filip, Peter
    Eggerstedt, Michael
    Hutz, Michael
    Michaelides, Elias
    Batra, Pete S.
    Tajudeen, Bobby A.
    OTO OPEN, 2024, 8 (02)
  • [34] The Performance of ChatGPT-4 and Gemini Ultra 1.0 for Quality Assurance Review in Emergency Medical Services Chest Pain Calls
    Brant-Zawadzki, Graham
    Klapthor, Brent
    Ryba, Chris
    Youngquist, Drew C.
    Burton, Brooke
    Palatinus, Helen
    Youngquist, Scott T.
    PREHOSPITAL EMERGENCY CARE, 2024,
  • [35] Evaluating ChatGPT-4 in medical education: an assessment of subject exam performance reveals limitations in clinical curriculum support for students
    Mackey B.P.
    Garabet R.
    Maule L.
    Tadesse A.
    Cross J.
    Weingarten M.
    Discover Artificial Intelligence, 2024, 4 (01):
  • [36] Comparison of the problem-solving performance of ChatGPT-3.5, ChatGPT-4, Bing Chat, and Bard for the Korean emergency medicine board examination question bank
    Lee, Go Un
    Hong, Dae Young
    Kim, Sin Young
    Kim, Jong Won
    Lee, Young Hwan
    Park, Sang O.
    Lee, Kyeong Ryong
    MEDICINE, 2024, 103 (09) : E37325
  • [37] Performance of ChatGPT-4 and Bard chatbots in responding to common patient questions on prostate cancer 177Lu-PSMA-617 therapy
    Bilgin, Gokce Belge
    Bilgin, Cem
    Childs, Daniel S.
    Orme, Jacob J.
    Burkett, Brian J.
    Packard, Ann T.
    Johnson, Derek R.
    Thorpe, Matthew P.
    Riaz, Irbaz Bin
    Halfdanarson, Thorvardur R.
    Johnson, Geoffrey B.
    Sartor, Oliver
    Kendi, Ayse Tuba
    FRONTIERS IN ONCOLOGY, 2024, 14
  • [38] Factors that Affect the National Student Performance Examination Grades of Brazilian Undergraduate Medical Programs
    Neto, Toufic Anbar
    Fucuta Pereia, Patricia da Silva
    Nogueira, Mauricio L.
    Pereira de Gody, Jose Maria
    Moscardini, Airton C.
    GMS JOURNAL FOR MEDICAL EDUCATION, 2018, 35 (01):
  • [39] Original Paper Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study
    Flores-Cohaila, Javier A.
    Garcia-Vicente, Abigail
    Vizcarra-Jimenez, Sonia F.
    De la Cruz-Galan, Janith
    Gutierrez-Arratia, Jesus
    Torres, Blanca Geraldine Quiroga
    Taype-Rondan, Alvaro
    JMIR MEDICAL EDUCATION, 2023, 9
  • [40] Exploring the Performance of ChatGPT-4 in the Taiwan Audiologist Qualification Examination: Preliminary Observational Study Highlighting the Potential of AI Chatbots in Hearing Care
    Wang, Shangqiguo
    Mo, Changgeng
    Chen, Yuan
    Dai, Xiaolu
    Wang, Huiyi
    Shen, Xiaoli
    JMIR MEDICAL EDUCATION, 2024, 10