Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions

被引:0
|
作者
Sarangi, Pradosh Kumar [1 ]
Datta, Suvrankar [2 ]
Panda, Braja Behari [3 ]
Panda, Swaha [4 ]
Mondal, Himel [5 ]
机构
[1] All India Inst Med Sci, Dept Radiodiag, Deoghar 814152, Jharkhand, India
[2] All India Inst Med Sci, Dept Radiodiag, New Delhi, India
[3] Veer Surendra Sai Inst Med Sci & Res, Dept Radiodiag, Burla, Odisha, India
[4] All India Inst Med Sci, Dept Otorhinolaryngol & Head Neck Surg, Deoghar, Jharkhand, India
[5] All India Inst Med Sci, Dept Physiol, Deoghar, Jharkhand, India
关键词
artificial intelligence; ChatGPT-4; large language model; radiology; FRCR; anatomy; fellowship; GPT-4;
D O I
10.1055/s-0044-1792040
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Background Radiology is critical for diagnosis and patient care, relying heavily on accurate image interpretation. Recent advancements in artificial intelligence (AI) and natural language processing (NLP) have raised interest in the potential of AI models to support radiologists, although robust research on AI performance in this field is still emerging. Objective This study aimed to assess the efficacy of ChatGPT-4 in answering radiological anatomy questions similar to those in the Fellowship of the Royal College of Radiologists (FRCR) Part 1 Anatomy examination. Methods We used 100 mock radiological anatomy questions from a free Web site patterned after the FRCR Part 1 Anatomy examination. ChatGPT-4 was tested under two conditions: with and without context regarding the examination instructions and question format. The main query posed was: "Identify the structure indicated by the arrow(s)." Responses were evaluated against correct answers, and two expert radiologists (>5 and 30 years of experience in radiology diagnostics and academics) rated the explanation of the answers. We calculated four scores: correctness, sidedness, modality identification, and approximation. The latter considers partial correctness if the identified structure is present but not the focus of the question. Results Both testing conditions saw ChatGPT-4 underperform, with correctness scores of 4 and 7.5% for no context and with context, respectively. However, it identified the imaging modality with 100% accuracy. The model scored over 50% on the approximation metric, where it identified present structures not indicated by the arrow. However, it struggled with identifying the correct side of the structure, scoring approximately 42 and 40% in the no context and with context settings, respectively. Only 32% of the responses were similar across the two settings. Conclusion Despite its ability to correctly recognize the imaging modality, ChatGPT-4 has significant limitations in interpreting normal radiological anatomy. This indicates the necessity for enhanced training in normal anatomy to better interpret abnormal radiological images. Identifying the correct side of structures in radiological images also remains a challenge for ChatGPT-4.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination
    Hsieh, Ching-Hua
    Hsieh, Hsiao-Yun
    Lin, Hui-Ping
    HELIYON, 2024, 10 (14)
  • [2] Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
    Wiwanitkit, Somsri
    Wiwanitkit, Viroj
    REVISTA DA ASSOCIACAO MEDICA BRASILEIRA, 2024, 70 (03):
  • [3] Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
    Gobira, Mauro
    Nakayama, Luis Filipe
    Moreira, Rodrigo
    Andrade, Eric
    Regatieri, Caio Vinicius Saito
    Belfort Jr, Rubens
    REVISTA DA ASSOCIACAO MEDICA BRASILEIRA, 2023, 69 (10):
  • [4] Evaluating ChatGPT-4's performance as a digital health advisor for otosclerosis surgery
    Sahin, Samil
    Erkmen, Burak
    Duymaz, Yasar Kemal
    Bayram, Furkan
    Tekin, Ahmet Mahmut
    Topsakal, Vedat
    FRONTIERS IN SURGERY, 2024, 11
  • [5] Revolutionizing Diagnostics: Evaluating ChatGPT-4's Performance in Ulcerative Colitis Endoscopic Assessment
    Levartovsky, A.
    Albshesh, A.
    Grinman, A.
    Shachar, E.
    Lahat, A.
    Eliakim, R.
    Kopylov, U.
    JOURNAL OF CROHNS & COLITIS, 2025, 19 : I748 - I748
  • [6] Evaluating ChatGPT-4's performance on oral and maxillofacial queries: Chain of Thought and standard method
    Ji, Kaiyuan
    Wu, Zhihan
    Han, Jing
    Zhai, Guangtao
    Liu, Jiannan
    FRONTIERS IN ORAL HEALTH, 2025, 6
  • [7] The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma
    Dogan, Levent
    Yilmaz, Ibrahim Edhem
    EUROPEAN JOURNAL OF OPHTHALMOLOGY, 2025,
  • [8] Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment
    Lai, U. Hin
    Wu, Keng Sam
    Hsu, Ting-Yu
    Kan, Jessie Kai Ching
    FRONTIERS IN MEDICINE, 2023, 10
  • [9] ChatGPT-4 Performance on USMLE Step 1 Style Questions and Its Implications for Medical Education: Correspondence
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    MEDICAL SCIENCE EDUCATOR, 2024, 34 (03) : 715 - 715
  • [10] EVALUATING THE PERFORMANCE OF CHATGPT ON CARDIO-ONCOLOGY BOARD EXAMINATION QUESTIONS
    Bhave, Aditya H.
    Mishra, Samarth
    Patel, Aditya
    Sharma, Gyanendra K.
    JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2024, 83 (13) : 2663 - 2663