Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study

被引：0

作者：

Yoon, Soo-Hyuk ^{[1
]}

Oh, Seok Kyeong ^{[2
]}

Lim, Byung Gun ^{[2
]}

Lee, Ho-Jin ^{[1
]}

机构：

[1] Seoul Natl Univ, Coll Med, Seoul Natl Univ Hosp, Dept Anesthesiol & Pain Med, Daehak Ro 101, Seoul 03080, South Korea

[2] Korea Univ, Guro Hosp, Coll Med, Dept Anesthesiol & Pain Med, Seoul, South Korea

来源：

JMIR MEDICAL EDUCATION | 2024年 / 10卷

关键词：

AI tools; problem solving; anesthesiology; artificial intelligence; pain medicine; ChatGPT; health care; medical education; South Korea; BOARD;

D O I：

10.2196/56859

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

Background: ChatGPT has been tested in health care, including the US Medical Licensing Examination and specialty exams,showing near-passing results. Its performance in the field of anesthesiology has been assessed using English board examination questions; however, its effectiveness in Korea remains unexplored. Objective: This study investigated the problem-solving performance of ChatGPT in the fields of anesthesiology and painmedicine in the Korean language context, highlighted advancements in artificial intelligence (AI), and explored its potentialapplications in medical education. Methods: We investigated the performance (number of correct answers/number of questions) of GPT-4, GPT-3.5, and CLOVAX in the fields of anesthesiology and pain medicine, using in-training examinations that have been administered to Koreananesthesiology residents over the past 5 years, with an annual composition of 100 questions. Questions containing images,diagrams, or photographs were excluded from the analysis. Furthermore, to assess the performance differences of the GPT acrossdifferent languages, we conducted a comparative analysis of the GPT-4's problem-solving proficiency using both the originalKorean texts and their English translations. Results: A total of 398 questions were analyzed. GPT-4 (67.8%) demonstrated a significantly better overall performance thanGPT-3.5 (37.2%) and CLOVA-X (36.7%). However, GPT-3.5 and CLOVA X did not show significant differences in their overallperformance. Additionally, the GPT-4 showed superior performance on questions translated into English, indicating a languageprocessing discrepancy (English: 75.4% vs Korean: 67.8%; difference 7.5%; 95% CI 3.1%-11.9%; P=.001). Conclusions: This study underscores the potential of AI tools, such as ChatGPT, in medical education and practice but emphasizesthe need for cautious application and further refinement, especially in non-English medical contexts. The findings suggest thatalthough AI advancements are promising, they require careful evaluation and development to ensure acceptable performanceacross diverse linguistic and professional settings.

引用

页数：10

共 50 条

[31] DO UNITED STATES MEDICAL LICENSING EXAMINATION (USMLE) SCORES PREDICT IN-TRAINING TEST PERFORMANCE FOR EMERGENCY MEDICINE RESIDENTS?
Thundiyil, Josef G.
Modica, Renee F.
Silvestri, Salvatore
Papa, Linda
JOURNAL OF EMERGENCY MEDICINE, 2010, 38 (01): : 65 - 69
[32] The effect of a toxicology standardized curriculum on toxicology section In-Training Examination scores of emergency medicine residents
Boyd, Molly
CLINICAL TOXICOLOGY, 2017, 55 (07) : 809 - 809
[33] Musculoskeletal Knowledge on the in-Training Examination Improves in Family Medicine Residents Participating in a Longitudinal Sports Medicine Clinical Track
Furr, Micah
Tumin, Dmitry
Ferderber, Megan L.
JOURNAL OF MEDICAL EDUCATION AND CURRICULAR DEVELOPMENT, 2024, 11
[34] Factors Predictive of Orthopaedic In-training Examination Performance and Research Productivity Among Orthopaedic Residents
Kreitz, Tyler
Verma, Satyendra
Adan, Alexei
Verma, Kushagra
JOURNAL OF THE AMERICAN ACADEMY OF ORTHOPAEDIC SURGEONS, 2019, 27 (06) : E286 - E292
[35] Performance of US and international medical graduates on the 1995 Internal Medicine In-Training Examination
Waxman, HS
Garibaldi, RA
Subhiyah, RG
ANNALS OF INTERNAL MEDICINE, 1996, 125 (02) : 158 - 158
[36] Validation of the General Medicine in-Training Examination Using the Professional and Linguistic Assessments Board Examination Among Postgraduate Residents in Japan
Nagasaki, Kazuya
Nishizaki, Yuji
Nojima, Masanori
Shimizu, Taro
Konishi, Ryota
Okubo, Tomoya
Yamamoto, Yu
Morishima, Ryo
Kobayashi, Hiroyuki
Tokuda, Yasuharu
INTERNATIONAL JOURNAL OF GENERAL MEDICINE, 2021, 14 : 6487 - 6495
[37] Impact of Question Bank Use for In-Training Examination Preparation by OBGYN Residents - A Multicenter Study
Green, Isabel
Weaver, Amy
Kircher, Samantha
Levy, Gary
Brady, Robert Michael
Flicker, Amanda B.
Gala, Rajiv B.
Peterson, Joseph
Decesare, Julie
Breitkopf, Daniel
JOURNAL OF SURGICAL EDUCATION, 2022, 79 (03) : 775 - 782
[38] Reading Habits of General Surgery Residents and Association With American Board of Surgery In-Training Examination Performance
Kim, Jerry J.
Kim, Dennis Y.
Kaji, Amy H.
Gifford, Edward D.
Reid, Christopher
Sidwell, Richard A.
Reeves, Mark E.
Hartranft, Thomas H.
Inaba, Kenji
Jarman, Benjamin T.
Are, Chandrakanth
Galante, Joseph M.
Amersi, Farin
Smith, Brian R.
Melcher, Marc L.
Nelson, Timothy
Donahue, Timothy
Jacobsen, Garth
Arnell, Tracey D.
de Virgilio, Christian
JAMA SURGERY, 2015, 150 (09) : 882 - 889
[39] Does Correlation of Faculty Assessment of Emergency Medicine Residents' Medical Knowledge Competency With Performance on the In-Training Examination Improve With Advancement Through the Program?
Barlas, D.
Ryan, J. G.
ANNALS OF EMERGENCY MEDICINE, 2009, 54 (03) : S33 - S33
[40] Does the Preferred Study Source Impact Orthopedic In-Training Examination Performance?
Theismann, Jeffrey J.
Solberg, Erik J.
Agel, Julie
Dyer, George S.
Egol, Kenneth A.
Israelite, Craig L.
Karam, Matthew D.
Kim, Hubert
Klein, Sandra E.
Kweon, Christopher Y.
LaPorte, Dawn M.
Van Heest, Ann
JOURNAL OF SURGICAL EDUCATION, 2022, 79 (01) : 266 - 273

← 1 2 3 4 5 →