Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports

被引:22
|
作者
Hasani, Amir M. [1 ]
Singh, Shiva [2 ]
Zahergivar, Aryan [2 ]
Ryan, Beth [3 ]
Nethala, Daniel [3 ]
Bravomontenegro, Gabriela [3 ]
Mendhiratta, Neil [3 ]
Ball, Mark [3 ]
Farhadi, Faraz [2 ]
Malayeri, Ashkan [2 ]
机构
[1] NHBLI, Lab Translat Res, NIH, Bethesda, MD USA
[2] NIH, Radiol & Imaging Sci Dept, Clin Ctr, Bethesda, MD 20892 USA
[3] NCI, Urol Oncol Branch, NIH, Bethesda, MD USA
基金
美国国家卫生研究院;
关键词
Artificial intelligence; Natural language processing; Digital health; Machine learning;
D O I
10.1007/s00330-023-10384-x
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
ObjectiveRadiology reporting is an essential component of clinical diagnosis and decision-making. With the advent of advanced artificial intelligence (AI) models like GPT-4 (Generative Pre-trained Transformer 4), there is growing interest in evaluating their potential for optimizing or generating radiology reports. This study aimed to compare the quality and content of radiologist-generated and GPT-4 AI-generated radiology reports.MethodsA comparative study design was employed in the study, where a total of 100 anonymized radiology reports were randomly selected and analyzed. Each report was processed by GPT-4, resulting in the generation of a corresponding AI-generated report. Quantitative and qualitative analysis techniques were utilized to assess similarities and differences between the two sets of reports.ResultsThe AI-generated reports showed comparable quality to radiologist-generated reports in most categories. Significant differences were observed in clarity (p = 0.027), ease of understanding (p = 0.023), and structure (p = 0.050), favoring the AI-generated reports. AI-generated reports were more concise, with 34.53 fewer words and 174.22 fewer characters on average, but had greater variability in sentence length. Content similarity was high, with an average Cosine Similarity of 0.85, Sequence Matcher Similarity of 0.52, BLEU Score of 0.5008, and BERTScore F1 of 0.8775.ConclusionThe results of this proof-of-concept study suggest that GPT-4 can be a reliable tool for generating standardized radiology reports, offering potential benefits such as improved efficiency, better communication, and simplified data extraction and analysis. However, limitations and ethical implications must be addressed to ensure the safe and effective implementation of this technology in clinical practice.Clinical relevance statementThe findings of this study suggest that GPT-4 (Generative Pre-trained Transformer 4), an advanced AI model, has the potential to significantly contribute to the standardization and optimization of radiology reporting, offering improved efficiency and communication in clinical practice.Key Points center dot Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports.center dot Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports.center dot Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.Key Points center dot Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports.center dot Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports.center dot Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.Key Points center dot Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports.center dot Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports. center dot Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.
引用
收藏
页码:3566 / 3574
页数:9
相关论文
共 50 条
  • [2] Generative Pre-trained Transformer 4 (GPT-4) in clinical settings
    Bellini, Valentina
    Bignami, Elena Giovanna
    LANCET DIGITAL HEALTH, 2025, 7 (01): : e6 - e7
  • [3] Enhancing emergency department charting: Using Generative Pre-trained Transformer-4 (GPT-4) to identify laceration repairs
    Bains, Jaskaran
    Williams, Christopher Y. K.
    Johnson, Drake
    Schwartz, Hope
    Sabbineni, Naina
    Butte, Atul J.
    Kornblith, Aaron E.
    ACADEMIC EMERGENCY MEDICINE, 2025, 32 (01) : 94 - 97
  • [4] Performance of Generative Pre-trained Transformer-4 (GPT-4) in Membership of the Royal College of General Practitioners (MRCGP)-style examination questions
    Armitage, Richard C.
    POSTGRADUATE MEDICAL JOURNAL, 2024, 100 (1182) : 274 - 275
  • [5] Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4)
    Truhn, Daniel
    Loeffler, Chiara M. L.
    Mueller-Franzes, Gustav
    Nebelung, Sven
    Hewitt, Katherine J.
    Brandner, Sebastian
    Bressem, Keno K.
    Foersch, Sebastian
    Kather, Jakob Nikolas
    JOURNAL OF PATHOLOGY, 2024, 262 (03): : 310 - 319
  • [6] Performance Evaluation of the Generative Pre-trained Transformer (GPT-4) on the Family Medicine In-Training Examination
    Wang, Ting
    Mainous III, Arch G.
    Stelter, Keith
    O'Neill, Thomas R.
    Newton, Warren P.
    JOURNAL OF THE AMERICAN BOARD OF FAMILY MEDICINE, 2024, 37 (04) : 528 - 582
  • [7] Generative pre-trained transformer (GPT)-4 support for differential diagnosis in neuroradiology
    Sorin, Vera
    Klang, Eyal
    Sobeh, Tamer
    Konen, Eli
    Shrot, Shai
    Livne, Adva
    Weissbuch, Yulian
    Hoffmann, Chen
    Barash, Yiftach
    QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2024, 14 (10)
  • [8] Evaluating GPT-4 on Impressions Generation in Radiology Reports
    Sun, Zhaoyi
    Ong, Hanley
    Kennedy, Patrick
    Tang, Liyan
    Chen, Shirley
    Elias, Jonathan
    Lucas, Eugene
    Shih, George
    Peng, Yifan
    RADIOLOGY, 2023, 307 (05)
  • [9] Using Generative Pre-Trained Transformer-4 (GPT-4), ffmpeg, and Microsoft Azure to Aid in Creating a Text-to-Video Generation Tool to Improve Safety Shares and Incident Descriptions in the Mining Industry
    de Almeida, Tulio Dias
    de Oliveira, Natanna Nunes
    He, Chandi
    Rocha, Carlos Philipe Silva
    Teixeira, Marcelo Bandeira
    Rogers, Pratt
    Kocsis, Karoly Charles
    MINING METALLURGY & EXPLORATION, 2025,
  • [10] Editorial Commentary: Generative Pre-trained Transformer 4 (GPT4) makes cardiovascular magnetic resonance reports easy to understand
    Banerjee, Imon
    Tariq, Amara
    Chao, Chieh-Ju
    JOURNAL OF CARDIOVASCULAR MAGNETIC RESONANCE, 2024, 26 (01)