Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments

被引:0
|
作者
Dana Brin
Vera Sorin
Akhil Vaid
Ali Soroush
Benjamin S. Glicksberg
Alexander W. Charney
Girish Nadkarni
Eyal Klang
机构
[1] Chaim Sheba Medical Center,Department of Diagnostic Imaging
[2] Tel-Aviv University,Faculty of Medicine
[3] Icahn School of Medicine at Mount Sinai,The Charles Bronfman Institute of Personalized Medicine
[4] Icahn School of Medicine at Mount Sinai,Division of Data
[5] Icahn School of Medicine at Mount Sinai,Driven and Digital Medicine (D3M)
[6] Icahn School of Medicine at Mount Sinai,Hasso Plattner Institute for Digital Health
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
The United States Medical Licensing Examination (USMLE) has been a subject of performance study for artificial intelligence (AI) models. However, their performance on questions involving USMLE soft skills remains unexplored. This study aimed to evaluate ChatGPT and GPT-4 on USMLE questions involving communication skills, ethics, empathy, and professionalism. We used 80 USMLE-style questions involving soft skills, taken from the USMLE website and the AMBOSS question bank. A follow-up query was used to assess the models’ consistency. The performance of the AI models was compared to that of previous AMBOSS users. GPT-4 outperformed ChatGPT, correctly answering 90% compared to ChatGPT’s 62.5%. GPT-4 showed more confidence, not revising any responses, while ChatGPT modified its original answers 82.5% of the time. The performance of GPT-4 was higher than that of AMBOSS's past users. Both AI models, notably GPT-4, showed capacity for empathy, indicating AI's potential to meet the complex interpersonal, ethical, and professional demands intrinsic to the practice of medicine.
引用
收藏
相关论文
共 50 条
  • [31] ChatGPT surges ahead: GPT-4 has arrived in the arena of medical research
    Wang, Ying-Mei
    Chen, Tzeng-Ji
    JOURNAL OF THE CHINESE MEDICAL ASSOCIATION, 2023, 86 (09) : 784 - 785
  • [32] Improving the Readability of Generated Tests Using GPT-4 and ChatGPT Code Interpreter
    Gay, Gregory
    SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2023, 2024, 14415 : 140 - 146
  • [33] Artificial Intelligence in Intensive Care Medicine: Toward a ChatGPT/GPT-4 Way?
    Yanqiu Lu
    Haiyang Wu
    Shaoyan Qi
    Kunming Cheng
    Annals of Biomedical Engineering, 2023, 51 : 1898 - 1903
  • [34] Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists
    Li, Dian-Jeng
    Kao, Yu-Chen
    Tsai, Shih-Jen
    Bai, Ya-Mei
    Yeh, Ta-Chuan
    Chu, Che-Sheng
    Hsu, Chih-Wei
    Cheng, Szu-Wei
    Hsu, Tien-Wei
    Liang, Chih-Sung
    Su, Kuan-Pin
    PSYCHIATRY AND CLINICAL NEUROSCIENCES, 2024, 78 (06) : 347 - 352
  • [35] Large language models for structured reporting in radiology: performance of GPT-4, ChatGPT-3.5, Perplexity and Bing
    Carlo A. Mallio
    Andrea C. Sertorio
    Caterina Bernetti
    Bruno Beomonte Zobel
    La radiologia medica, 2023, 128 : 808 - 812
  • [36] The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education
    Rizzo, Michael G.
    Cai, Nathan
    Constantinescu, David
    JOURNAL OF ORTHOPAEDICS, 2024, 50 : 70 - 75
  • [37] Comparing GPT-3.5 and GPT-4 Accuracy and Drift in Radiology Diagnosis Please Cases
    Li, David
    Gupta, Kartik
    Bhaduri, Mousumi
    Sathiadoss, Paul
    Bhatnagar, Sahir
    Chong, Jaron
    RADIOLOGY, 2024, 310 (01)
  • [38] Large language models for structured reporting in radiology: performance of GPT-4, ChatGPT-3.5, Perplexity and Bing
    Mallio, Carlo A.
    Sertorio, Andrea C.
    Bernetti, Caterina
    Beomonte Zobel, Bruno
    RADIOLOGIA MEDICA, 2023, 128 (07): : 808 - 812
  • [39] Claude 3 Opus and ChatGPT With GPT-4 in Dermoscopic Image Analysis for Melanoma Diagnosis: Comparative Performance Analysis
    Liu, Xu
    Duan, Chaoli
    Kim, Min-kyu
    Zhang, Lu
    Jee, Eunjin
    Maharjan, Beenu
    Huang, Yuwei
    Du, Dan
    Jiang, Xian
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [40] Performance of Novel GPT-4 in Otolaryngology Knowledge Assessment
    Revercomb, Lucy
    Patel, Aman M.
    Fu, Daniel
    Filimonov, Andrey
    INDIAN JOURNAL OF OTOLARYNGOLOGY AND HEAD & NECK SURGERY, 2024, 76 (06) : 6112 - 6114