Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports

被引:22
|
作者
Hasani, Amir M. [1 ]
Singh, Shiva [2 ]
Zahergivar, Aryan [2 ]
Ryan, Beth [3 ]
Nethala, Daniel [3 ]
Bravomontenegro, Gabriela [3 ]
Mendhiratta, Neil [3 ]
Ball, Mark [3 ]
Farhadi, Faraz [2 ]
Malayeri, Ashkan [2 ]
机构
[1] NHBLI, Lab Translat Res, NIH, Bethesda, MD USA
[2] NIH, Radiol & Imaging Sci Dept, Clin Ctr, Bethesda, MD 20892 USA
[3] NCI, Urol Oncol Branch, NIH, Bethesda, MD USA
基金
美国国家卫生研究院;
关键词
Artificial intelligence; Natural language processing; Digital health; Machine learning;
D O I
10.1007/s00330-023-10384-x
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
ObjectiveRadiology reporting is an essential component of clinical diagnosis and decision-making. With the advent of advanced artificial intelligence (AI) models like GPT-4 (Generative Pre-trained Transformer 4), there is growing interest in evaluating their potential for optimizing or generating radiology reports. This study aimed to compare the quality and content of radiologist-generated and GPT-4 AI-generated radiology reports.MethodsA comparative study design was employed in the study, where a total of 100 anonymized radiology reports were randomly selected and analyzed. Each report was processed by GPT-4, resulting in the generation of a corresponding AI-generated report. Quantitative and qualitative analysis techniques were utilized to assess similarities and differences between the two sets of reports.ResultsThe AI-generated reports showed comparable quality to radiologist-generated reports in most categories. Significant differences were observed in clarity (p = 0.027), ease of understanding (p = 0.023), and structure (p = 0.050), favoring the AI-generated reports. AI-generated reports were more concise, with 34.53 fewer words and 174.22 fewer characters on average, but had greater variability in sentence length. Content similarity was high, with an average Cosine Similarity of 0.85, Sequence Matcher Similarity of 0.52, BLEU Score of 0.5008, and BERTScore F1 of 0.8775.ConclusionThe results of this proof-of-concept study suggest that GPT-4 can be a reliable tool for generating standardized radiology reports, offering potential benefits such as improved efficiency, better communication, and simplified data extraction and analysis. However, limitations and ethical implications must be addressed to ensure the safe and effective implementation of this technology in clinical practice.Clinical relevance statementThe findings of this study suggest that GPT-4 (Generative Pre-trained Transformer 4), an advanced AI model, has the potential to significantly contribute to the standardization and optimization of radiology reporting, offering improved efficiency and communication in clinical practice.Key Points center dot Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports.center dot Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports.center dot Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.Key Points center dot Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports.center dot Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports.center dot Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.Key Points center dot Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports.center dot Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports. center dot Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.
引用
收藏
页码:3566 / 3574
页数:9
相关论文
共 50 条
  • [41] Generative AI Meets Animal Welfare: Evaluating GPT-4 for Pet Emotion Detection
    Cetintav, Bekir
    Guven, Yavuz Selim
    Gulek, Engincan
    Akbas, Aykut Asim
    ANIMALS, 2025, 15 (04):
  • [42] Performance of Chat Generative Pre-Trained Transformer on Personal Review of Learning in Obstetrics and Gynecology
    Cohen, Adam
    Burns, Jersey
    Gabra, Martina
    Gordon, Alex
    Deebel, Nicholas
    Terlecki, Ryan
    Woodburn, Katherine L.
    SOUTHERN MEDICAL JOURNAL, 2025, 118 (02) : 102 - 105
  • [43] GPT-4's Performance on the European Board of Interventional Radiology Sample Questions
    Besler, Muhammed Said
    CARDIOVASCULAR AND INTERVENTIONAL RADIOLOGY, 2024, 47 (05) : 683 - 684
  • [44] GPT-4's Performance on the European Board of Interventional Radiology Sample Questions
    Muhammed Said Beşler
    CardioVascular and Interventional Radiology, 2024, 47 : 683 - 684
  • [45] Re-evaluating GPT-4's bar exam performance
    Martinez, Eric
    ARTIFICIAL INTELLIGENCE AND LAW, 2024,
  • [46] GPT (Generative Pre-Trained Transformer)-A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions
    Yenduri, Gokul
    Ramalingam, M.
    Selvi, G. Chemmalar
    Supriya, Y.
    Srivastava, Gautam
    Maddikunta, Praveen Kumar Reddy
    Raj, G. Deepti
    Jhaveri, Rutvij H.
    Prabadevi, B.
    Wang, Weizheng
    Vasilakos, Athanasios V.
    Gadekallu, Thippa Reddy
    IEEE ACCESS, 2024, 12 : 54608 - 54649
  • [47] GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization
    Huang, Jia-Hong
    Murn, Luka
    Mrak, Marta
    Worring, Marcel
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 580 - 589
  • [48] EVALUATING THE FEASIBILITY AND READABILITY OF GENERATIVE PRETRAINED TRANSFORMER 4 (GPT-4)-GENERATED LAYPERSON HOSPITAL SUMMARIES FOR ENHANCED PATIENT COMPREHENSION OF DISEASE AND DISCHARGE INSTRUCTIONS
    Ganjavi, Conner
    Sanchez, Desiree
    Ballon, Jorge
    Abreu, Andre
    Gill, Inderbir S.
    Cacciamani, Giovanni E.
    JOURNAL OF UROLOGY, 2024, 211 (05): : E562 - E563
  • [49] Use of GPT-4 With Single-Shot Learning to Identify Incidental Findings in Radiology Reports
    Bhayana, Rajesh
    Elias, Gavin
    Datta, Daksh
    Bhambra, Nishaant
    Deng, Yangqing
    Krishna, Satheesh
    AMERICAN JOURNAL OF ROENTGENOLOGY, 2024, 222 (03)
  • [50] Enhancing radiology training with GPT-4: Pilot analysis of automated feedback in trainee preliminary reports
    Bala, Wasif
    Li, Hanzhou
    Moon, John
    Trivedi, Hari
    Gichoya, Judy
    Balthazar, Patricia
    CURRENT PROBLEMS IN DIAGNOSTIC RADIOLOGY, 2025, 54 (02) : 151 - 158