Evaluating the limits of AI in medical specialisation: ChatGPT's performance on the UK Neurology Specialty Certificate Examination

被引:38
|
作者
Giannos, Panagiotis [1 ,2 ,3 ]
机构
[1] Imperial Coll London, Dept Life Sci, London, England
[2] Soc Meta Res & Biomed Innovat, London, England
[3] Promot Emerging & Evaluat Res Soc, London, England
关键词
clinical neurology; medicine; health policy & practice;
D O I
10.1136/bmjno-2023-000451
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
BackgroundLarge language models such as ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests. However, no studies have assessed their performance in the UK medical education context, particularly at a specialty level, and specifically in the field of neurology and neuroscience.MethodsWe evaluated the performance of ChatGPT in higher specialty training for neurology and neuroscience using 69 questions from the Pool-Specialty Certificate Examination (SCE) Neurology Web Questions bank. The dataset primarily focused on neurology (80%). The questions spanned subtopics such as symptoms and signs, diagnosis, interpretation and management with some questions addressing specific patient populations. The performance of ChatGPT 3.5 Legacy, ChatGPT 3.5 Default and ChatGPT-4 models was evaluated and compared.ResultsChatGPT 3.5 Legacy and ChatGPT 3.5 Default displayed overall accuracies of 42% and 57%, respectively, falling short of the passing threshold of 58% for the 2022 SCE neurology examination. ChatGPT-4, on the other hand, achieved the highest accuracy of 64%, surpassing the passing threshold and outperforming its predecessors across disciplines and subtopics.ConclusionsThe advancements in ChatGPT-4's performance compared with its predecessors demonstrate the potential for artificial intelligence (AI) models in specialised medical education and practice. However, our findings also highlight the need for ongoing development and collaboration between AI developers and medical experts to ensure the models' relevance and reliability in the rapidly evolving field of medicine.
引用
收藏
页数:4
相关论文
共 23 条
  • [1] ChatGPT: performance of artificial intelligence in the dermatology specialty certificate examination
    Jabour, Thais Barros Felippe
    Ribeiro Junior, Jose Paulo
    Fernandes, Alexandre Chaves
    Honorato, Cecilia Mirelle Almeida
    Queiroz, Maria do Carmo Araujo Palmeira
    ANAIS BRASILEIROS DE DERMATOLOGIA, 2024, 99 (02) : 277 - 279
  • [2] Performance of ChatGPT on Specialty Certificate Examination in Dermatology multiple-choice questions
    Passby, Lauren
    Jenko, Nathan
    Wernham, Aaron
    CLINICAL AND EXPERIMENTAL DERMATOLOGY, 2023, 49 (07) : 722 - 727
  • [3] Evaluating the Performance of ChatGPT in Dermatology Specialty Certificate Examination-style Questions: A Comparative Analysis between English and Korean Language Settings
    Joh, Hae C.
    Kim, Moon-Hwan
    Ko, Joo Y.
    Kim, Joung S.
    Jue, Mihn S.
    INDIAN JOURNAL OF DERMATOLOGY, 2024, 69 (04) : 338 - 341
  • [4] Evaluating AI Competence in Specialized Medicine: Comparative Analysis of ChatGPT and Neurologists in a Neurology Specialist Examination in Spain
    Ros-Arlanzon, Pablo
    Perez-Sempere, Angel
    JMIR MEDICAL EDUCATION, 2024, 10
  • [5] Assessment of ChatGPT's performance on neurology written board examination questions
    Chen, Tse Chian
    Multala, Evan
    Kearns, Patrick
    Delashaw, Johnny
    Dumont, Aaron
    Maraganore, Demetrius
    Wang, Arthur
    BMJ NEUROLOGY OPEN, 2023, 5 (02)
  • [6] Issues for consideration about use of ChatGPT. Comment on 'Performance of ChatGPT on Specialty Certificate Examination in Dermatology multiple-choice questions'
    Kleebayoon, Amnuay
    Wiwanitkit, Viroj
    CLINICAL AND EXPERIMENTAL DERMATOLOGY, 2023,
  • [7] Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI
    Wang, Hongyan
    Wu, Weizhen
    Dou, Zhi
    He, Liangliang
    Yang, Liqiang
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 177
  • [8] Evaluating the performance of ChatGPT in medical physiology university examination of phase I MBBS
    Subramani, Mahila
    Jaleel, Ilham
    Mohan, Surapaneni Krishna
    ADVANCES IN PHYSIOLOGY EDUCATION, 2023, 47 (02) : 270 - 271
  • [9] Assessing ChatGPT's performance in national nuclear medicine specialty examination: An evaluative analysis
    Kufel, Jakub
    Bielowka, Michal
    Rojek, Marcin
    Mitrega, Adam
    Czogalik, Lukasz
    Kaczynska, Dominika
    Kondol, Dominika
    Palkij, Kacper
    Mielcarska, Sylwia
    IRANIAN JOURNAL OF NUCLEAR MEDICINE, 2024, 32 (01): : 60 - 65
  • [10] A Potential Role for AI: Evaluating ChatGPT's Efficacy in Prioritizing Medical Waiting Lists
    Morcilla, Jericho
    Cao, Jessica Anning
    Fan, Kenneth
    Rahman, Effie
    Khang Ngo
    Patel, Sagar
    Chaudhary, Varun
    Wykoff, Charles Clifton
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)