Evaluating the limits of AI in medical specialisation: ChatGPT's performance on the UK Neurology Specialty Certificate Examination

被引:38
|
作者
Giannos, Panagiotis [1 ,2 ,3 ]
机构
[1] Imperial Coll London, Dept Life Sci, London, England
[2] Soc Meta Res & Biomed Innovat, London, England
[3] Promot Emerging & Evaluat Res Soc, London, England
关键词
clinical neurology; medicine; health policy & practice;
D O I
10.1136/bmjno-2023-000451
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
BackgroundLarge language models such as ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests. However, no studies have assessed their performance in the UK medical education context, particularly at a specialty level, and specifically in the field of neurology and neuroscience.MethodsWe evaluated the performance of ChatGPT in higher specialty training for neurology and neuroscience using 69 questions from the Pool-Specialty Certificate Examination (SCE) Neurology Web Questions bank. The dataset primarily focused on neurology (80%). The questions spanned subtopics such as symptoms and signs, diagnosis, interpretation and management with some questions addressing specific patient populations. The performance of ChatGPT 3.5 Legacy, ChatGPT 3.5 Default and ChatGPT-4 models was evaluated and compared.ResultsChatGPT 3.5 Legacy and ChatGPT 3.5 Default displayed overall accuracies of 42% and 57%, respectively, falling short of the passing threshold of 58% for the 2022 SCE neurology examination. ChatGPT-4, on the other hand, achieved the highest accuracy of 64%, surpassing the passing threshold and outperforming its predecessors across disciplines and subtopics.ConclusionsThe advancements in ChatGPT-4's performance compared with its predecessors demonstrate the potential for artificial intelligence (AI) models in specialised medical education and practice. However, our findings also highlight the need for ongoing development and collaboration between AI developers and medical experts to ensure the models' relevance and reliability in the rapidly evolving field of medicine.
引用
收藏
页数:4
相关论文
共 23 条
  • [21] ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice
    Riedel, Maximilian
    Kaefinger, Katharina
    Stuehrenberg, Antonia
    Ritter, Viktoria
    Amann, Niklas
    Graf, Anna
    Recker, Florian
    Klein, Evelyn
    Kiechle, Marion
    Riedel, Fabian
    Meyer, Bastian
    FRONTIERS IN MEDICINE, 2023, 10
  • [22] ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice
    Riedel, M.
    Kaefinger, K.
    Stuehrenberg, A.
    Ritter, V.
    Amann, N.
    Graf, A.
    Recker, F.
    Klein, E.
    Kiechle, M.
    Riedel, F.
    Meyer, B.
    GEBURTSHILFE UND FRAUENHEILKUNDE, 2024, 84 (06) : E36 - E37
  • [23] Evaluating ChatGPT's Multilingual Performance in Clinical Nutrition Advice Using Synthetic Medical Text: Insights from Central Asia
    Adilmetova, Gulnoza
    Nassyrov, Ruslan
    Meyerbekova, Aizhan
    Karabay, Aknur
    Varol, Huseyin Atakan
    Chan, Mei-Yen
    JOURNAL OF NUTRITION, 2025, 155 (03): : 729 - 735