Evaluating ChatGPT-4's Performance in Identifying Radiological Anatomy in FRCR Part 1 Examination Questions

被引：0

作者：

Sarangi, Pradosh Kumar ^{[1
]}

Datta, Suvrankar ^{[2
]}

Panda, Braja Behari ^{[3
]}

Panda, Swaha ^{[4
]}

Mondal, Himel ^{[5
]}

机构：

[1] All India Inst Med Sci, Dept Radiodiag, Deoghar 814152, Jharkhand, India

[2] All India Inst Med Sci, Dept Radiodiag, New Delhi, India

[3] Veer Surendra Sai Inst Med Sci & Res, Dept Radiodiag, Burla, Odisha, India

[4] All India Inst Med Sci, Dept Otorhinolaryngol & Head Neck Surg, Deoghar, Jharkhand, India

[5] All India Inst Med Sci, Dept Physiol, Deoghar, Jharkhand, India

来源：

INDIAN JOURNAL OF RADIOLOGY AND IMAGING | 2024年

关键词：

artificial intelligence; ChatGPT-4; large language model; radiology; FRCR; anatomy; fellowship; GPT-4;

D O I：

10.1055/s-0044-1792040

中图分类号：

R8 [特种医学]; R445 [影像诊断学];

学科分类号：

1002 ; 100207 ; 1009 ;

摘要：

Background Radiology is critical for diagnosis and patient care, relying heavily on accurate image interpretation. Recent advancements in artificial intelligence (AI) and natural language processing (NLP) have raised interest in the potential of AI models to support radiologists, although robust research on AI performance in this field is still emerging. Objective This study aimed to assess the efficacy of ChatGPT-4 in answering radiological anatomy questions similar to those in the Fellowship of the Royal College of Radiologists (FRCR) Part 1 Anatomy examination. Methods We used 100 mock radiological anatomy questions from a free Web site patterned after the FRCR Part 1 Anatomy examination. ChatGPT-4 was tested under two conditions: with and without context regarding the examination instructions and question format. The main query posed was: "Identify the structure indicated by the arrow(s)." Responses were evaluated against correct answers, and two expert radiologists (>5 and 30 years of experience in radiology diagnostics and academics) rated the explanation of the answers. We calculated four scores: correctness, sidedness, modality identification, and approximation. The latter considers partial correctness if the identified structure is present but not the focus of the question. Results Both testing conditions saw ChatGPT-4 underperform, with correctness scores of 4 and 7.5% for no context and with context, respectively. However, it identified the imaging modality with 100% accuracy. The model scored over 50% on the approximation metric, where it identified present structures not indicated by the arrow. However, it struggled with identifying the correct side of the structure, scoring approximately 42 and 40% in the no context and with context settings, respectively. Only 32% of the responses were similar across the two settings. Conclusion Despite its ability to correctly recognize the imaging modality, ChatGPT-4 has significant limitations in interpreting normal radiological anatomy. This indicates the necessity for enhanced training in normal anatomy to better interpret abnormal radiological images. Identifying the correct side of structures in radiological images also remains a challenge for ChatGPT-4.

引用

页数：8

共 50 条

[1] Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination
Hsieh, Ching-Hua
Hsieh, Hsiao-Yun
Lin, Hui-Ping
HELIYON, 2024, 10 (14)
[2] Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
Wiwanitkit, Somsri
Wiwanitkit, Viroj
REVISTA DA ASSOCIACAO MEDICA BRASILEIRA, 2024, 70 (03):
[3] Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation
Gobira, Mauro
Nakayama, Luis Filipe
Moreira, Rodrigo
Andrade, Eric
Regatieri, Caio Vinicius Saito
Belfort Jr, Rubens
REVISTA DA ASSOCIACAO MEDICA BRASILEIRA, 2023, 69 (10):
[4] Evaluating ChatGPT-4's performance as a digital health advisor for otosclerosis surgery
Sahin, Samil
Erkmen, Burak
Duymaz, Yasar Kemal
Bayram, Furkan
Tekin, Ahmet Mahmut
Topsakal, Vedat
FRONTIERS IN SURGERY, 2024, 11
[5] Revolutionizing Diagnostics: Evaluating ChatGPT-4's Performance in Ulcerative Colitis Endoscopic Assessment
Levartovsky, A.
Albshesh, A.
Grinman, A.
Shachar, E.
Lahat, A.
Eliakim, R.
Kopylov, U.
JOURNAL OF CROHNS & COLITIS, 2025, 19 : I748 - I748
[6] Evaluating ChatGPT-4's performance on oral and maxillofacial queries: Chain of Thought and standard method
Ji, Kaiyuan
Wu, Zhihan
Han, Jing
Zhai, Guangtao
Liu, Jiannan
FRONTIERS IN ORAL HEALTH, 2025, 6
[7] The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma
Dogan, Levent
Yilmaz, Ibrahim Edhem
EUROPEAN JOURNAL OF OPHTHALMOLOGY, 2025,
[8] Evaluating the performance of ChatGPT-4 on the United Kingdom Medical Licensing Assessment
Lai, U. Hin
Wu, Keng Sam
Hsu, Ting-Yu
Kan, Jessie Kai Ching
FRONTIERS IN MEDICINE, 2023, 10
[9] ChatGPT-4 Performance on USMLE Step 1 Style Questions and Its Implications for Medical Education: Correspondence
Daungsupawong, Hinpetch
Wiwanitkit, Viroj
MEDICAL SCIENCE EDUCATOR, 2024, 34 (03) : 715 - 715
[10] EVALUATING THE PERFORMANCE OF CHATGPT ON CARDIO-ONCOLOGY BOARD EXAMINATION QUESTIONS
Bhave, Aditya H.
Mishra, Samarth
Patel, Aditya
Sharma, Gyanendra K.
JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2024, 83 (13) : 2663 - 2663

← 1 2 3 4 5 →