Performance of ChatGPT in the In-Training Examination for Anesthesiology and Pain Medicine Residents in South Korea: Observational Study

被引:0
|
作者
Yoon, Soo-Hyuk [1 ]
Oh, Seok Kyeong [2 ]
Lim, Byung Gun [2 ]
Lee, Ho-Jin [1 ]
机构
[1] Seoul Natl Univ, Coll Med, Seoul Natl Univ Hosp, Dept Anesthesiol & Pain Med, Daehak Ro 101, Seoul 03080, South Korea
[2] Korea Univ, Guro Hosp, Coll Med, Dept Anesthesiol & Pain Med, Seoul, South Korea
来源
JMIR MEDICAL EDUCATION | 2024年 / 10卷
关键词
AI tools; problem solving; anesthesiology; artificial intelligence; pain medicine; ChatGPT; health care; medical education; South Korea; BOARD;
D O I
10.2196/56859
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Background: ChatGPT has been tested in health care, including the US Medical Licensing Examination and specialty exams,showing near-passing results. Its performance in the field of anesthesiology has been assessed using English board examination questions; however, its effectiveness in Korea remains unexplored. Objective: This study investigated the problem-solving performance of ChatGPT in the fields of anesthesiology and painmedicine in the Korean language context, highlighted advancements in artificial intelligence (AI), and explored its potentialapplications in medical education. Methods: We investigated the performance (number of correct answers/number of questions) of GPT-4, GPT-3.5, and CLOVAX in the fields of anesthesiology and pain medicine, using in-training examinations that have been administered to Koreananesthesiology residents over the past 5 years, with an annual composition of 100 questions. Questions containing images,diagrams, or photographs were excluded from the analysis. Furthermore, to assess the performance differences of the GPT acrossdifferent languages, we conducted a comparative analysis of the GPT-4's problem-solving proficiency using both the originalKorean texts and their English translations. Results: A total of 398 questions were analyzed. GPT-4 (67.8%) demonstrated a significantly better overall performance thanGPT-3.5 (37.2%) and CLOVA-X (36.7%). However, GPT-3.5 and CLOVA X did not show significant differences in their overallperformance. Additionally, the GPT-4 showed superior performance on questions translated into English, indicating a languageprocessing discrepancy (English: 75.4% vs Korean: 67.8%; difference 7.5%; 95% CI 3.1%-11.9%; P=.001). Conclusions: This study underscores the potential of AI tools, such as ChatGPT, in medical education and practice but emphasizesthe need for cautious application and further refinement, especially in non-English medical contexts. The findings suggest thatalthough AI advancements are promising, they require careful evaluation and development to ensure acceptable performanceacross diverse linguistic and professional settings.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] WHAT DO INTERNAL MEDICINE RESIDENTS SAY ABOUT INTERNSHIP PREPARATION? FINDINGS FROM AN INTERNAL MEDICINE IN-TRAINING EXAMINATION SURVEY
    Harrell, Heather
    Pereira, Anne
    Vu, T. R.
    Kane, Greg
    JOURNAL OF GENERAL INTERNAL MEDICINE, 2015, 30 : S302 - S302
  • [42] Study habits of surgery residents and performance on American Board of Surgery In-Training examinations
    Derossis, AM
    Da Rosa, D
    Schwartz, A
    Hauge, LS
    Bordage, G
    AMERICAN JOURNAL OF SURGERY, 2004, 188 (03): : 230 - 236
  • [43] Effect of Question Bank Usage on Performance on the American Board of Surgery In-Training Examination in General Surgery Residents
    Ray, Juliet J.
    Meizoso, Jonathan P.
    Horkan, Davis B.
    Karcutskie, Charles A.
    Rao, Krishnamurti A.
    Hilton, L. Renee
    Brasseur, Benjamin M.
    Sleeman, Danny
    Schulman, Carl I.
    JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2017, 225 (04) : S183 - S183
  • [44] Association of Gender and Race/Ethnicity with Internal Medicine In-Training Examination Performance in Graduate Medical Education
    Robin Klein
    Jennifer Koch
    Erin D. Snyder
    Anna Volerman
    Wendy Simon
    Simerjot K. Jassal
    Dominique Cosco
    Anne Cioletti
    Nneka N. Ufere
    Sherri-Ann M. Burnett-Bowie
    Kerri Palamara
    Sarah Schaeffer
    Katherine A. Julian
    Vanessa Thompson
    Journal of General Internal Medicine, 2022, 37 : 2194 - 2199
  • [45] Association of Gender and Race/Ethnicity with Internal Medicine In-Training Examination Performance in Graduate Medical Education
    Klein, Robin
    Koch, Jennifer
    Snyder, Erin D.
    Volerman, Anna
    Simon, Wendy
    Jassal, Simerjot K.
    Cosco, Dominique
    Cioletti, Anne
    Ufere, Nneka N.
    Burnett-Bowie, Sherri-Ann M.
    Palamara, Kerri
    Schaeffer, Sarah
    Julian, Katherine A.
    Thompson, Vanessa
    JOURNAL OF GENERAL INTERNAL MEDICINE, 2022, 37 (09) : 2194 - 2199
  • [46] Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study
    Yu, Peng
    Fang, Changchang
    Liu, Xiaolin
    Fu, Wanying
    Ling, Jitao
    Yan, Zhiwei
    Jiang, Yuan
    Cao, Zhengyu
    Wu, Maoxiong
    Chen, Zhiteng
    Zhu, Wengen
    Zhang, Yuling
    Abudukeremu, Ayiguli
    Wang, Yue
    Liu, Xiao
    Wang, Jingfeng
    JMIR MEDICAL EDUCATION, 2024, 10
  • [47] Using Precept-Assist® to Predict Performance on the American Board of Family Medicine In-Training Examination
    Post, Robert E.
    Jamena, Gemma P.
    Gamble, James D.
    FAMILY MEDICINE, 2014, 46 (08) : 603 - 607
  • [48] Using United States Medical Licensing Examination® (USMLE) Examination Results to Predict Later In-Training Examination Performance Among General Surgery Residents
    Spurlock, Darrell R., Jr.
    Holden, Charles
    Hartranft, Thomas
    JOURNAL OF SURGICAL EDUCATION, 2010, 67 (06) : 452 - 456
  • [49] Optimal outpatient training for resident physicians' general medicine in-training examination score: a cross-sectional study
    Miyagami, Taiju
    Nishizaki, Yuji
    Shimizu, Taro
    Yamamoto, Yu
    Shikino, Kiyoshi
    Kataoka, Koshi
    Nojima, Masanori
    Deshpande, Gautam
    Naito, Toshio
    Tokuda, Yasuharu
    BMC MEDICAL EDUCATION, 2025, 25 (01)
  • [50] Can USMLE and COMLEX-USA Scores Predict AtRisk Emergency Medicine Residents' Performance on In-Training Examinations?
    Plewa, Michael C.
    Ledrick, David J.
    Jenkins, Kenneth
    Orqvist, Aaron
    McCrea, Michael
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (04)