Influence of Model Evolution and System Roles onChatGPT's Performance in Chinese Medical LicensingExams: Comparative Study

被引：2

作者：

Ming, Shuai ^{[1
,2
]}

Guo, Qingge ^{[1
,2
,3
]}

Cheng, Wenjun ^{[4
]}

Lei, Bo ^{[1
,2
,3
]}

机构：

[1] Henan Eye Hosp, Henan Prov Peoples Hosp, Dept Ophthalmol, 7 Weiwu Rd, Zhengzhou 450003, Peoples R China

[2] Henan Acad Innovat Med Sci, Eye Inst, Zhengzhou, Peoples R China

[3] Zhengzhou Univ, Henan Clin Res Ctr Ocular Dis, Peoples Hosp, Zhengzhou, Peoples R China

[4] Zhengzhou Univ, Dept Ophthalmol, Peoples Hosp, Zhengzhou, Peoples R China

来源：

JMIR MEDICAL EDUCATION | 2024年 / 10卷

关键词：

ChatGPT; Chinese National Medical Licensing Examination; large language models; medical education; systemrole; LLM; LLMs; language model; language models; artificial intelligence; chatbot; chatbots; conversational agent; conver-sational agents; exam; exams; examination; examinations; OpenAI; answer; answers; response; responses; accuracy; performance; China; Chinese; CHATGPT; GPT-4;

D O I：

10.2196/52784

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

Background: With the increasing application of large language models like ChatGPT in various industries, its potential in themedical domain, especially in standardized examinations, has become a focal point of research. Objective: The aim of this study is to assess the clinical performance of ChatGPT, focusing on its accuracy and reliability inthe Chinese National Medical Licensing Examination (CNMLE). Methods: The CNMLE 2022 question set, consisting of 500 single-answer multiple choices questions, were reclassified into15 medical subspecialties. Each question was tested 8 to 12 times in Chinese on the OpenAI platform from April 24 to May 15,2023. Three key factors were considered: the version of GPT-3.5 and 4.0, the prompt's designation of system roles tailored tomedical subspecialties, and repetition for coherence. A passing accuracy threshold was established as 60%. The chi 2 tests and kappa values were employed to evaluate the model's accuracy and consistency. Results: GPT-4.0 achieved a passing accuracy of 72.7%, which was significantly higher than that of GPT-3.5 (54%; P<.001).The variability rate of repeated responses from GPT-4.0 was lower than that of GPT-3.5 (9% vs 19.5%; P<.001). However,both models showed relatively good response coherence, with kappa values of 0.778 and 0.610, respectively. System rolesnumerically increased accuracy for both GPT-4.0 (0.3%-3.7%) and GPT-3.5 (1.3%-4.5%), and reduced variability by 1.7%and 1.8%, respectively (P>.05). In subgroup analysis, ChatGPT achieved comparable accuracy among different question types(P>.05). GPT-4.0 surpassed the accuracy threshold in 14 of 15 subspecialties, while GPT-3.5 did so in 7 of 15 on the firstresponse. Conclusions: GPT-4.0 passed the CNMLE and outperformed GPT-3.5 in key areas such as accuracy, consistency, andmedical subspecialty expertise. Adding a system role insignificantly enhanced the model's reliability and answer coherence.GPT-4.0 showed promising potential in medical education and clinical practice, meriting further study

引用

页数：11

共 41 条

[31] Side-stream enhanced biological phosphorus removal (S2EBPR) process improves system performance - A full-scale comparative study
Wang, Dongqi
Tooker, Nicholas B.
Srinivasan, Varun
Li, Guangyu
Fernandez, Loretta A.
Schauer, Peter
Menniti, Adrienne
Maher, Chris
Bott, Charles B.
Dombrowski, Paul
Barnard, James L.
Onnis-Hayden, Annalisa
Gu, April Z.
WATER RESEARCH, 2019, 167
[32] Creation, evolution, and future challenges of ion beam therapy from a medical physicist's viewpoint (Part 2). Chapter 2. Biophysical model, treatment planning system and image guided radiotherapy
Endo, Masahiro
RADIOLOGICAL PHYSICS AND TECHNOLOGY, 2023, 16 (02) : 137 - 159
[33] Creation, evolution, and future challenges of ion beam therapy from a medical physicist’s viewpoint (Part 2). Chapter 2. Biophysical model, treatment planning system and image guided radiotherapy
Masahiro Endo
Radiological Physics and Technology, 2023, 16 : 137 - 159
[34] How Does Culture Influence Chinese People's Views and Actions on e-Government Websites? A Study on Citizens' Continuous Use of e-Government Websites Based on Cultural Model
Guo, Yuanyuan
TOGETHER IN THE UNSTABLE WORLD: DIGITAL GOVERNMENT AND SOLIDARITY, 2023, : 135 - 143
[35] The influence of migration on women's use of different aspects of maternity care in the German health care system: Secondary analysis of a comparative prospective study with the Migrant Friendly Maternity Care Questionnaire (MFMCQ)
Seidel, Vera
Gurbuz, Burcu
Grosskreutz, Claudia
Vortel, Martina
Borde, Theda
Rancourt, Rebecca C.
Stepan, Holger
Sauzet, Odile
Henrich, Wolfgang
David, Matthias
BIRTH-ISSUES IN PERINATAL CARE, 2020, 47 (01): : 39 - 48
[36] Influence of driver's reaction time and gain on driver-vehicle system performance with rear wheel steering control systems: part of a study on vehicle control suitable for the aged driver
Wang, B
Abe, M
Kano, Y
JSAE REVIEW, 2002, 23 (01): : 75 - 82
[37] Influence of the 3D architecture and surface roughness of SiOC anodes on bioelectrochemical system performance: a comparative study of freeze-cast, 3D-printed, and tape-cast materials with uniform composition
Braun, Pedro Henrique da Rosa
Kuchenbuch, Anne
Toselli, Bruno
Rezwan, Kurosch
Harnisch, Falk
Wilhelm, Michaela
MATERIALS FOR RENEWABLE AND SUSTAINABLE ENERGY, 2024, 13 (01) : 81 - 96
[38] Influence of the 3D architecture and surface roughness of SiOC anodes on bioelectrochemical system performance: a comparative study of freeze-cast, 3D-printed, and tape-cast materials with uniform composition
Pedro Henrique da Rosa Braun
Anne Kuchenbuch
Bruno Toselli
Kurosch Rezwan
Falk Harnisch
Michaela Wilhelm
Materials for Renewable and Sustainable Energy, 2024, 13 : 81 - 96
[39] Comparative study on thermodynamic performance of liquid hydrogen storage insulation system incorporating vapor-cooled shield with para-ortho hydrogen conversion by one-dimensional and quasi-two-dimensional model
Leng, Yakun
Zhang, Shengqi
Wang, Xinyang
Pu, Liang
Xu, Peng
ENERGY CONVERSION AND MANAGEMENT, 2024, 321
[40] Influence of the COVID-19 Outbreak in Vulnerable Patients (Pediatric Patients, Pregnant Women, and Elderly Patients) on an Emergency Medical Service System: A Pre- and Post-COVID-19 Pandemic Comparative Study Using the Population-Based ORION Registry
Ota, Koshi
Nitta, Masahiko
Komeya, Tomonobu
Matsuoka, Tetsuya
Takasu, Akira
MEDICINA-LITHUANIA, 2024, 60 (02):

← 1 2 3 4 5 →