Clinical Knowledge and Reasoning Abilities of AI Large Language Models in Anesthesiology: A Comparative Study on the American Board of Anesthesiology Examination

被引:5
|
作者
Angel, Mirana C. [1 ,2 ]
Rinehart, Joseph B. [3 ]
Cannesson, Maxime P. [4 ]
Baldi, Pierre [1 ,2 ]
机构
[1] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA
[2] Univ Calif Irvine, Inst Genom & Bioinformat, Irvine, CA USA
[3] Univ Calif Irvine, Dept Anesthesiol & Perioperat Care, Irvine, CA USA
[4] Univ Calif Los Angeles, Dept Anesthesiol & Perioperat Med, Los Angeles, CA USA
来源
ANESTHESIA AND ANALGESIA | 2024年 / 139卷 / 02期
关键词
D O I
10.1213/ANE.0000000000006892
中图分类号
R614 [麻醉学];
学科分类号
100217 ;
摘要
BACKGROUND:Over the past decade, artificial intelligence (AI) has expanded significantly with increased adoption across various industries, including medicine. Recently, AI-based large language models such as Generative Pretrained Transformer-3 (GPT-3), Bard, and Generative Pretrained Transformer-3 (GPT-4) have demonstrated remarkable language capabilities. While previous studies have explored their potential in general medical knowledge tasks, here we assess their clinical knowledge and reasoning abilities in a specialized medical context.METHODS:We studied and compared the performance of all 3 models on both the written and oral portions of the comprehensive and challenging American Board of Anesthesiology (ABA) examination, which evaluates candidates' knowledge and competence in anesthesia practice.RESULTS:Our results reveal that only GPT-4 successfully passed the written examination, achieving an accuracy of 78% on the basic section and 80% on the advanced section. In comparison, the less recent or smaller GPT-3 and Bard models scored 58% and 47% on the basic examination, and 50% and 46% on the advanced examination, respectively. Consequently, only GPT-4 was evaluated in the oral examination, with examiners concluding that it had a reasonable possibility of passing the structured oral examination. Additionally, we observe that these models exhibit varying degrees of proficiency across distinct topics, which could serve as an indicator of the relative quality of information contained in the corresponding training datasets. This may also act as a predictor for determining which anesthesiology subspecialty is most likely to witness the earliest integration with AI.CONCLUSIONS:GPT-4 outperformed GPT-3 and Bard on both basic and advanced sections of the written ABA examination, and actual board examiners considered GPT-4 to have a reasonable possibility of passing the real oral examination; these models also exhibit varying degrees of proficiency across distinct topics.
引用
收藏
页码:349 / 356
页数:8
相关论文
共 35 条
  • [1] Large Language Models and the American Board of Anesthesiology Examination
    Macario, Alex
    Minhaj, Mohammed M.
    Keegan, Mark T.
    Harman, Ann E.
    ANESTHESIA AND ANALGESIA, 2025, 140 (01): : e7 - e8
  • [2] Large Language Models and the American Board of Anesthesiology Examination Response
    Dost, Burhan
    De Cassai, Alessandro
    ANESTHESIA AND ANALGESIA, 2025, 140 (01): : 12 - 12
  • [3] Does the American Board of Anesthesiology BASIC Examination Really Affect Anesthesiology Resident Knowledge Acquisition?
    Pivalizza, Evan G.
    Nwokolo, Omonele O.
    Ghebremichael, Semhar J.
    Markham, Travis H.
    Guzman-Reyes, Sara
    Gumbert, Sam D.
    Williams, George W.
    ANESTHESIOLOGY, 2018, 129 (06) : 1189 - 1190
  • [4] Does the American Board of Anesthesiology BASIC Examination Really Affect Anesthesiology Resident Knowledge Acquisition? Reply
    Murray, David J.
    Boulet, John R.
    ANESTHESIOLOGY, 2018, 129 (06) : 1190 - 1191
  • [5] Artificial Intelligence for Anesthesiology Board-Style Examination Questions: Role of Large Language Models
    Khan, Adnan A.
    Yunus, Rayaan
    Sohail, Mahad
    Rehman, Taha A.
    Saeed, Shirin
    Bu, Yifan
    Jackson, Cullen D.
    Sharkey, Aidan
    Mahmood, Feroze
    Matyal, Robina
    JOURNAL OF CARDIOTHORACIC AND VASCULAR ANESTHESIA, 2024, 38 (05) : 1251 - 1259
  • [6] Clinical Performance Scores Are Independently Associated with the American Board of Anesthesiology Certification Examination Scores
    Baker, Keith
    Sun, Huaping
    Harman, Ann
    Poon, Trudy
    Rathmell, James P.
    ANESTHESIA AND ANALGESIA, 2016, 122 (06): : 1992 - 1999
  • [7] DEVELOPMENT OF THE KNOWLEDGE-BASED STANDARD FOR THE WRITTEN CERTIFICATION EXAMINATION OF THE AMERICAN-BOARD-OF-ANESTHESIOLOGY
    SLOGOFF, S
    HUGHES, FP
    HUG, CC
    ACADEMIC MEDICINE, 1992, 67 (02) : 124 - 126
  • [8] The American Board of Anesthesiology Gets a Passing Grade on Its New Objective Structured Clinical Examination
    Saddawi-Konefka, Daniel
    Baker, Keith H.
    ANESTHESIA AND ANALGESIA, 2020, 131 (05): : 1409 - 1411
  • [9] First-Year Results of the American Board of Anesthesiology's Objective Structured Clinical Examination for Initial Certification
    Warner, David O.
    Lien, Cynthia A.
    Wang, Ting
    Zhou, Yan
    Isaak, Robert S.
    Peterson-Layne, Cathleen
    Harman, Ann E.
    Macario, Alex
    Gaiser, Robert R.
    Suresh, Santhanam
    Culley, Deborah J.
    Rathmell, James P.
    Keegan, Mark T.
    Cole, Daniel J.
    Fahy, Brenda G.
    Dainer, Rupa J.
    Sun, Huaping
    ANESTHESIA AND ANALGESIA, 2020, 131 (05): : 1412 - 1418
  • [10] Residency program directors' perceptions about the impact of the American Board of Anesthesiology's Objective Structured Clinical Examination
    Chen, Dandan
    Sun, Huaping
    Warner, David O.
    Macario, Alex
    JOURNAL OF CLINICAL ANESTHESIA, 2021, 75