Assessing GPT-4 multimodal performance in radiological image analysis

被引:5
|
作者
Brin, Dana [1 ,2 ]
Sorin, Vera [1 ,2 ,3 ]
Barash, Yiftach [1 ,2 ,3 ]
Konen, Eli [1 ,2 ]
Glicksberg, Benjamin S. [4 ]
Nadkarni, Girish N. [5 ,6 ]
Klang, Eyal [1 ,2 ,3 ,5 ,6 ]
机构
[1] Chaim Sheba Med Ctr, Dept Diagnost Imaging, Tel Hashomer, Israel
[2] Tel Aviv Univ, Fac Med, Tel Aviv, Israel
[3] Chaim Sheba Med Ctr, DeepVis Lab, Tel Hashomer, Israel
[4] Icahn Sch Med Mt Sinai, Hasso Plattner Inst Digital Hlth, New York, NY USA
[5] Icahn Sch Med Mt Sinai, Div Data Driven & Digital Med D3M, New York, NY USA
[6] Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY USA
关键词
Artificial intelligence; Diagnostic imaging; Radiology; Ultrasonography; Computed tomography (x-ray);
D O I
10.1007/s00330-024-11035-5
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Objectives This study aims to assess the performance of a multimodal artificial intelligence (AI) model capable of analyzing both images and textual data (GPT-4V), in interpreting radiological images. It focuses on a range of modalities, anatomical regions, and pathologies to explore the potential of zero-shot generative AI in enhancing diagnostic processes in radiology. Methods We analyzed 230 anonymized emergency room diagnostic images, consecutively collected over 1 week, using GPT-4V. Modalities included ultrasound (US), computerized tomography (CT), and X-ray images. The interpretations provided by GPT-4V were then compared with those of senior radiologists. This comparison aimed to evaluate the accuracy of GPT-4V in recognizing the imaging modality, anatomical region, and pathology present in the images. Results GPT-4V identified the imaging modality correctly in 100% of cases (221/221), the anatomical region in 87.1% (189/217), and the pathology in 35.2% (76/216). However, the model's performance varied significantly across different modalities, with anatomical region identification accuracy ranging from 60.9% (39/64) in US images to 97% (98/101) and 100% (52/52) in CT and X-ray images (p < 0.001). Similarly, pathology identification ranged from 9.1% (6/66) in US images to 36.4% (36/99) in CT and 66.7% (34/51) in X-ray images (p < 0.001). These variations indicate inconsistencies in GPT-4V's ability to interpret radiological images accurately. Conclusion While the integration of AI in radiology, exemplified by multimodal GPT-4, offers promising avenues for diagnostic enhancement, the current capabilities of GPT-4V are not yet reliable for interpreting radiological images. This study underscores the necessity for ongoing development to achieve dependable performance in radiology diagnostics. Clinical relevance statement Although GPT-4V shows promise in radiological image interpretation, its high diagnostic hallucination rate (> 40%) indicates it cannot be trusted for clinical use as a standalone tool. Improvements are necessary to enhance its reliability and ensure patient safety. Key Points...
引用
收藏
页码:1959 / 1965
页数:7
相关论文
共 50 条
  • [1] Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
    Kaneda, Yudai
    Takahashi, Ryo
    Kaneda, Uiri
    Akashima, Shiori
    Okita, Haruna
    Misaki, Sadaya
    Yamashiro, Akimi
    Ozaki, Akihiko
    Tanimoto, Tetsuya
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
  • [2] Integration of GPT-4 into multimodal bioinformatics for surgical specimens
    Fan, Siqi
    Zheng, Yue
    Sun, Xu
    Zhao, Ailin
    Wu, Yijun
    INTERNATIONAL JOURNAL OF SURGERY, 2024, 110 (09) : 5854 - 5856
  • [3] GPT-4 Performance for Neurologic Localization
    Lee, Jung-Hyun
    Choi, Eunhee
    McDougal, Robert
    Lytton, William W.
    NEUROLOGY-CLINICAL PRACTICE, 2024, 14 (03)
  • [4] Assessing GPT-4's Performance in Delivering Medical Advice: Comparative Analysis With Human Experts
    Jo, Eunbeen
    Song, Sanghoun
    Kim, Jong -Ho
    Lim, Subin
    Kim, Ju Hyeon
    Cha, Jung - Joon
    Kim, Young -Min
    Joo, Hyung Joon
    JMIR MEDICAL EDUCATION, 2024, 10
  • [5] Performance of GPT-4 on Chinese Nursing Examination
    Miao, Yiqun
    Luo, Yuan
    Zhao, Yuhan
    Li, Jiawei
    Liu, Mingxuan
    Wang, Huiying
    Chen, Yuling
    Wu, Ying
    NURSE EDUCATOR, 2024, 49 (06) : E338 - E343
  • [6] Claude 3 Opus and ChatGPT With GPT-4 in Dermoscopic Image Analysis for Melanoma Diagnosis: Comparative Performance Analysis
    Liu, Xu
    Duan, Chaoli
    Kim, Min-kyu
    Zhang, Lu
    Jee, Eunjin
    Maharjan, Beenu
    Huang, Yuwei
    Du, Dan
    Jiang, Xian
    JMIR MEDICAL INFORMATICS, 2024, 12
  • [7] Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4
    Lahat, Adi
    Sharif, Kassem
    Zoabi, Narmin
    Patt, Yonatan Shneor
    Sharif, Yousra
    Fisher, Lior
    Shani, Uria
    Arow, Mohamad
    Levin, Roni
    Klang, Eyal
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [8] Automated Financial Analysis Using GPT-4
    Noels, Sander
    Merlevede, Adriaan
    Fecheyr, Andrew
    Vanhalst, Maarten
    Meerlaen, Nick
    Viaene, Sebastien
    De Bie, Tijl
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VII, 2023, 14175 : 345 - 349
  • [10] Feasibility of GPT-3 and GPT-4 for in-Depth Patient Education Prior to Interventional Radiological Procedures: A Comparative Analysis
    Scheschenja, Michael
    Viniol, Simon
    Bastian, Moritz B.
    Wessendorf, Joel
    Koenig, Alexander M.
    Mahnken, Andreas H.
    CARDIOVASCULAR AND INTERVENTIONAL RADIOLOGY, 2024, 47 (02) : 245 - 250