An open-source fine-tuned large language model for radiological impression generation: a multi-reader performance study

被引:1
|
作者
Serapio, Adrian [1 ]
Chaudhari, Gunvant [3 ]
Savage, Cody [2 ]
Lee, Yoo Jin [1 ]
Vella, Maya [1 ]
Sridhar, Shravan [1 ]
Schroeder, Jamie Lee [4 ]
Liu, Jonathan [1 ]
Yala, Adam [5 ,6 ]
Sohn, Jae Ho [1 ]
机构
[1] Univ Calif San Francisco, Dept Radiol & Biomed Imaging, San Francisco, CA 94143 USA
[2] Univ Maryland, Med Ctr, Dept Radiol, Baltimore, MD USA
[3] Univ Washington, Dept Radiol, Seattle, WA USA
[4] MedStar Georgetown Univ Hosp, Washington, DC USA
[5] Univ Calif Berkeley, Computat Precis Hlth, Berkeley, CA USA
[6] Univ Calif San Francisco, San Francisco, CA USA
来源
BMC MEDICAL IMAGING | 2024年 / 24卷 / 01期
关键词
Natural language processing; Large language model; Open-source; Summarization; Impressions;
D O I
10.1186/s12880-024-01435-w
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
BackgroundThe impression section integrates key findings of a radiology report but can be subjective and variable. We sought to fine-tune and evaluate an open-source Large Language Model (LLM) in automatically generating impressions from the remainder of a radiology report across different imaging modalities and hospitals.MethodsIn this institutional review board-approved retrospective study, we collated a dataset of CT, US, and MRI radiology reports from the University of California San Francisco Medical Center (UCSFMC) (n = 372,716) and the Zuckerberg San Francisco General (ZSFG) Hospital and Trauma Center (n = 60,049), both under a single institution. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score, an automatic natural language evaluation metric that measures word overlap, was used for automatic natural language evaluation. A reader study with five cardiothoracic radiologists was performed to more strictly evaluate the model's performance on a specific modality (CT chest exams) with a radiologist subspecialist baseline. We stratified the results of the reader performance study based on the diagnosis category and the original impression length to gauge case complexity.ResultsThe LLM achieved ROUGE-L scores of 46.51, 44.2, and 50.96 on UCSFMC and upon external validation, ROUGE-L scores of 40.74, 37.89, and 24.61 on ZSFG across the CT, US, and MRI modalities respectively, implying a substantial degree of overlap between the model-generated impressions and impressions written by the subspecialist attending radiologists, but with a degree of degradation upon external validation. In our reader study, the model-generated impressions achieved overall mean scores of 3.56/4, 3.92/4, 3.37/4, 18.29 s,12.32 words, and 84 while the original impression written by a subspecialist radiologist achieved overall mean scores of 3.75/4, 3.87/4, 3.54/4, 12.2 s, 5.74 words, and 89 for clinical accuracy, grammatical accuracy, stylistic quality, edit time, edit distance, and ROUGE-L score respectively. The LLM achieved the highest clinical accuracy ratings for acute/emergent findings and on shorter impressions.ConclusionsAn open-source fine-tuned LLM can generate impressions to a satisfactory level of clinical accuracy, grammatical accuracy, and stylistic quality. Our reader performance study demonstrates the potential of large language models in drafting radiology report impressions that can aid in streamlining radiologists' workflows.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Need of Fine-Tuned Radiology Aware Open-Source Large Language Models for Neuroradiology
    Ray, Partha Pratim
    CLINICAL NEURORADIOLOGY, 2024,
  • [2] Exploring Generalizability of a fine-tuned Large Language Model for Impression Generation in PET Reports
    Yousefirizi, F.
    Wang, L.
    Gowdy, C.
    Shariftabrizi, A.
    Harsini, S.
    Ahamed, S.
    Sabouri, M.
    Mollaheydar, E.
    Rahmim, A.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2024, 51 : S785 - S785
  • [3] A scientific-article key-insight extraction system based on multi-actor of fine-tuned open-source large language models
    Song, Zihan
    Hwang, Gyo-Yeob
    Zhang, Xin
    Huang, Shan
    Park, Byung-Kwon
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [4] CentralBankRoBERTa: A fine-tuned large language model for central bank communications☆
    Pfeifer, Moritz
    Marohl, Vincent P.
    JOURNAL OF FINANCE AND DATA SCIENCE, 2023, 9
  • [5] RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model
    Lu, Yao
    Liu, Shang
    Zhang, Qijun
    Xie, Zhiyao
    29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, 2024, : 722 - 727
  • [6] Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks
    Luo, Ling
    Ning, Jinzhong
    Zhao, Yingwen
    Wang, Zhijun
    Ding, Zeyuan
    Chen, Peng
    Fu, Weiru
    Han, Qinyu
    Xu, Guangtao
    Qiu, Yunzhi
    Pan, Dinghao
    Li, Jiru
    Li, Hao
    Feng, Wenduo
    Tu, Senbo
    Liu, Yuqi
    Yang, Zhihao
    Wang, Jian
    Sun, Yuanyuan
    Lin, Hongfei
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 1865 - 1874
  • [7] Classification of Interventional Radiology Reports into Technique Categories with a Fine-Tuned Large Language Model
    Yasaka, Koichiro
    Nomura, Takuto
    Kamohara, Jun
    Hirakawa, Hiroshi
    Kubo, Takatoshi
    Kiryu, Shigeru
    Abe, Osamu
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2024,
  • [8] A fine-tuned large language model based molecular dynamics agent for code generation to obtain material thermodynamic parameters
    Zhuofan Shi
    Chunxiao Xin
    Tong Huo
    Yuntao Jiang
    Bowen Wu
    Xingyue Chen
    Wei Qin
    Xinjian Ma
    Gang Huang
    Zhenyu Wang
    Xiang Jing
    Scientific Reports, 15 (1)
  • [9] Staged Multi-Strategy Framework With Open-Source Large Language Models for Natural Language to SQL Generation
    Liu, Chuanlong
    Liao, Wei
    Xu, Zhen
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2025,
  • [10] Performance comparison of retrieval-augmented generation and fine-tuned large language models for construction safety management knowledge retrieval
    Lee, Jungwon
    Ahn, Seungjun
    Kim, Daeho
    Kim, Dongkyun
    AUTOMATION IN CONSTRUCTION, 2024, 168