Influence of Model Evolution and System Roles onChatGPT's Performance in Chinese Medical LicensingExams: Comparative Study

被引:2
|
作者
Ming, Shuai [1 ,2 ]
Guo, Qingge [1 ,2 ,3 ]
Cheng, Wenjun [4 ]
Lei, Bo [1 ,2 ,3 ]
机构
[1] Henan Eye Hosp, Henan Prov Peoples Hosp, Dept Ophthalmol, 7 Weiwu Rd, Zhengzhou 450003, Peoples R China
[2] Henan Acad Innovat Med Sci, Eye Inst, Zhengzhou, Peoples R China
[3] Zhengzhou Univ, Henan Clin Res Ctr Ocular Dis, Peoples Hosp, Zhengzhou, Peoples R China
[4] Zhengzhou Univ, Dept Ophthalmol, Peoples Hosp, Zhengzhou, Peoples R China
来源
JMIR MEDICAL EDUCATION | 2024年 / 10卷
关键词
ChatGPT; Chinese National Medical Licensing Examination; large language models; medical education; systemrole; LLM; LLMs; language model; language models; artificial intelligence; chatbot; chatbots; conversational agent; conver-sational agents; exam; exams; examination; examinations; OpenAI; answer; answers; response; responses; accuracy; performance; China; Chinese; CHATGPT; GPT-4;
D O I
10.2196/52784
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background: With the increasing application of large language models like ChatGPT in various industries, its potential in themedical domain, especially in standardized examinations, has become a focal point of research. Objective: The aim of this study is to assess the clinical performance of ChatGPT, focusing on its accuracy and reliability inthe Chinese National Medical Licensing Examination (CNMLE). Methods: The CNMLE 2022 question set, consisting of 500 single-answer multiple choices questions, were reclassified into15 medical subspecialties. Each question was tested 8 to 12 times in Chinese on the OpenAI platform from April 24 to May 15,2023. Three key factors were considered: the version of GPT-3.5 and 4.0, the prompt's designation of system roles tailored tomedical subspecialties, and repetition for coherence. A passing accuracy threshold was established as 60%. The chi 2 tests and kappa values were employed to evaluate the model's accuracy and consistency. Results: GPT-4.0 achieved a passing accuracy of 72.7%, which was significantly higher than that of GPT-3.5 (54%; P<.001).The variability rate of repeated responses from GPT-4.0 was lower than that of GPT-3.5 (9% vs 19.5%; P<.001). However,both models showed relatively good response coherence, with kappa values of 0.778 and 0.610, respectively. System rolesnumerically increased accuracy for both GPT-4.0 (0.3%-3.7%) and GPT-3.5 (1.3%-4.5%), and reduced variability by 1.7%and 1.8%, respectively (P>.05). In subgroup analysis, ChatGPT achieved comparable accuracy among different question types(P>.05). GPT-4.0 surpassed the accuracy threshold in 14 of 15 subspecialties, while GPT-3.5 did so in 7 of 15 on the firstresponse. Conclusions: GPT-4.0 passed the CNMLE and outperformed GPT-3.5 in key areas such as accuracy, consistency, andmedical subspecialty expertise. Adding a system role insignificantly enhanced the model's reliability and answer coherence.GPT-4.0 showed promising potential in medical education and clinical practice, meriting further study
引用
收藏
页数:11
相关论文
共 41 条
  • [31] Side-stream enhanced biological phosphorus removal (S2EBPR) process improves system performance - A full-scale comparative study
    Wang, Dongqi
    Tooker, Nicholas B.
    Srinivasan, Varun
    Li, Guangyu
    Fernandez, Loretta A.
    Schauer, Peter
    Menniti, Adrienne
    Maher, Chris
    Bott, Charles B.
    Dombrowski, Paul
    Barnard, James L.
    Onnis-Hayden, Annalisa
    Gu, April Z.
    WATER RESEARCH, 2019, 167
  • [32] Creation, evolution, and future challenges of ion beam therapy from a medical physicist's viewpoint (Part 2). Chapter 2. Biophysical model, treatment planning system and image guided radiotherapy
    Endo, Masahiro
    RADIOLOGICAL PHYSICS AND TECHNOLOGY, 2023, 16 (02) : 137 - 159
  • [33] Creation, evolution, and future challenges of ion beam therapy from a medical physicist’s viewpoint (Part 2). Chapter 2. Biophysical model, treatment planning system and image guided radiotherapy
    Masahiro Endo
    Radiological Physics and Technology, 2023, 16 : 137 - 159
  • [34] How Does Culture Influence Chinese People's Views and Actions on e-Government Websites? A Study on Citizens' Continuous Use of e-Government Websites Based on Cultural Model
    Guo, Yuanyuan
    TOGETHER IN THE UNSTABLE WORLD: DIGITAL GOVERNMENT AND SOLIDARITY, 2023, : 135 - 143
  • [35] The influence of migration on women's use of different aspects of maternity care in the German health care system: Secondary analysis of a comparative prospective study with the Migrant Friendly Maternity Care Questionnaire (MFMCQ)
    Seidel, Vera
    Gurbuz, Burcu
    Grosskreutz, Claudia
    Vortel, Martina
    Borde, Theda
    Rancourt, Rebecca C.
    Stepan, Holger
    Sauzet, Odile
    Henrich, Wolfgang
    David, Matthias
    BIRTH-ISSUES IN PERINATAL CARE, 2020, 47 (01): : 39 - 48
  • [36] Influence of driver's reaction time and gain on driver-vehicle system performance with rear wheel steering control systems: part of a study on vehicle control suitable for the aged driver
    Wang, B
    Abe, M
    Kano, Y
    JSAE REVIEW, 2002, 23 (01): : 75 - 82
  • [37] Influence of the 3D architecture and surface roughness of SiOC anodes on bioelectrochemical system performance: a comparative study of freeze-cast, 3D-printed, and tape-cast materials with uniform composition
    Braun, Pedro Henrique da Rosa
    Kuchenbuch, Anne
    Toselli, Bruno
    Rezwan, Kurosch
    Harnisch, Falk
    Wilhelm, Michaela
    MATERIALS FOR RENEWABLE AND SUSTAINABLE ENERGY, 2024, 13 (01) : 81 - 96
  • [38] Influence of the 3D architecture and surface roughness of SiOC anodes on bioelectrochemical system performance: a comparative study of freeze-cast, 3D-printed, and tape-cast materials with uniform composition
    Pedro Henrique da Rosa Braun
    Anne Kuchenbuch
    Bruno Toselli
    Kurosch Rezwan
    Falk Harnisch
    Michaela Wilhelm
    Materials for Renewable and Sustainable Energy, 2024, 13 : 81 - 96
  • [39] Comparative study on thermodynamic performance of liquid hydrogen storage insulation system incorporating vapor-cooled shield with para-ortho hydrogen conversion by one-dimensional and quasi-two-dimensional model
    Leng, Yakun
    Zhang, Shengqi
    Wang, Xinyang
    Pu, Liang
    Xu, Peng
    ENERGY CONVERSION AND MANAGEMENT, 2024, 321
  • [40] Influence of the COVID-19 Outbreak in Vulnerable Patients (Pediatric Patients, Pregnant Women, and Elderly Patients) on an Emergency Medical Service System: A Pre- and Post-COVID-19 Pandemic Comparative Study Using the Population-Based ORION Registry
    Ota, Koshi
    Nitta, Masahiko
    Komeya, Tomonobu
    Matsuoka, Tetsuya
    Takasu, Akira
    MEDICINA-LITHUANIA, 2024, 60 (02):