Assessing GPT-4 multimodal performance in radiological image analysis

被引:5
|
作者
Brin, Dana [1 ,2 ]
Sorin, Vera [1 ,2 ,3 ]
Barash, Yiftach [1 ,2 ,3 ]
Konen, Eli [1 ,2 ]
Glicksberg, Benjamin S. [4 ]
Nadkarni, Girish N. [5 ,6 ]
Klang, Eyal [1 ,2 ,3 ,5 ,6 ]
机构
[1] Chaim Sheba Med Ctr, Dept Diagnost Imaging, Tel Hashomer, Israel
[2] Tel Aviv Univ, Fac Med, Tel Aviv, Israel
[3] Chaim Sheba Med Ctr, DeepVis Lab, Tel Hashomer, Israel
[4] Icahn Sch Med Mt Sinai, Hasso Plattner Inst Digital Hlth, New York, NY USA
[5] Icahn Sch Med Mt Sinai, Div Data Driven & Digital Med D3M, New York, NY USA
[6] Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY USA
关键词
Artificial intelligence; Diagnostic imaging; Radiology; Ultrasonography; Computed tomography (x-ray);
D O I
10.1007/s00330-024-11035-5
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Objectives This study aims to assess the performance of a multimodal artificial intelligence (AI) model capable of analyzing both images and textual data (GPT-4V), in interpreting radiological images. It focuses on a range of modalities, anatomical regions, and pathologies to explore the potential of zero-shot generative AI in enhancing diagnostic processes in radiology. Methods We analyzed 230 anonymized emergency room diagnostic images, consecutively collected over 1 week, using GPT-4V. Modalities included ultrasound (US), computerized tomography (CT), and X-ray images. The interpretations provided by GPT-4V were then compared with those of senior radiologists. This comparison aimed to evaluate the accuracy of GPT-4V in recognizing the imaging modality, anatomical region, and pathology present in the images. Results GPT-4V identified the imaging modality correctly in 100% of cases (221/221), the anatomical region in 87.1% (189/217), and the pathology in 35.2% (76/216). However, the model's performance varied significantly across different modalities, with anatomical region identification accuracy ranging from 60.9% (39/64) in US images to 97% (98/101) and 100% (52/52) in CT and X-ray images (p < 0.001). Similarly, pathology identification ranged from 9.1% (6/66) in US images to 36.4% (36/99) in CT and 66.7% (34/51) in X-ray images (p < 0.001). These variations indicate inconsistencies in GPT-4V's ability to interpret radiological images accurately. Conclusion While the integration of AI in radiology, exemplified by multimodal GPT-4, offers promising avenues for diagnostic enhancement, the current capabilities of GPT-4V are not yet reliable for interpreting radiological images. This study underscores the necessity for ongoing development to achieve dependable performance in radiology diagnostics. Clinical relevance statement Although GPT-4V shows promise in radiological image interpretation, its high diagnostic hallucination rate (> 40%) indicates it cannot be trusted for clinical use as a standalone tool. Improvements are necessary to enhance its reliability and ensure patient safety. Key Points...
引用
收藏
页码:1959 / 1965
页数:7
相关论文
共 50 条
  • [41] Assessing the quality of automatic-generated short answers using GPT-4
    Rodrigues L.
    Dwan Pereira F.
    Cabral L.
    Gašević D.
    Ramalho G.
    Ferreira Mello R.
    Computers and Education: Artificial Intelligence, 2024, 7
  • [42] Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis
    Hou, Wenpin
    Ji, Zhicheng
    NATURE METHODS, 2024, 21 (04) : 1462 - 1465
  • [43] GPT-4 for triaging ophthalmic symptoms
    Ethan Waisberg
    Joshua Ong
    Nasif Zaman
    Sharif Amit Kamran
    Prithul Sarker
    Alireza Tavakkoli
    Andrew G. Lee
    Eye, 2023, 37 : 3874 - 3875
  • [44] ChatGPT/GPT-4 and Spinal Surgeons
    Amnuay Kleebayoon
    Viroj Wiwanitkit
    Annals of Biomedical Engineering, 2023, 51 : 1657 - 1657
  • [45] Short answer scoring with GPT-4
    Jiang, Lan
    Bosch, Nigel
    PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON LEARNING@SCALE, L@S 2024, 2024, : 438 - 442
  • [46] ChatGPT/GPT-4 and Spinal Surgeons
    Kleebayoon, Amnuay
    Wiwanitkit, Viroj
    ANNALS OF BIOMEDICAL ENGINEERING, 2023, 51 (08) : 1657 - 1657
  • [47] Reply to "Performance of GPT-4 Vision on kidney pathology exam questions"
    Miao, Jing
    Thongprayoon, Charat
    Cheungpasitporn, Wisit
    Cornell, Lynn D.
    AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2024,
  • [48] Is GPT-4 a Good Data Analyst?
    Cheng, Liying
    Li, Xingxuan
    Bing, Lidong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9496 - 9514
  • [49] Using Natural Language Processing (GPT-4) for ComputedTomography Image Analysis of Cerebral Hemorrhages inRadiology:Retrospective Analysis
    Zhang, Daiwen
    Ma, Zixuan
    Gong, Ru
    Lian, Liangliang
    Li, Yanzhuo
    He, Zhenghui
    Han, Yuhan
    Hui, Jiyuan
    Huang, Jialin
    Jiang, Jiyao
    Weng, Weiji
    Feng, Junfeng
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [50] The model student: GPT-4 performance on graduate biomedical science exams
    Stribling, Daniel
    Xia, Yuxing
    Amer, Maha K.
    Graim, Kiley S.
    Mulligan, Connie J.
    Renne, Rolf
    SCIENTIFIC REPORTS, 2024, 14 (01)