Comprehensive testing of large language models for extraction of structured data in pathology

被引:0
|
作者
Bastian Grothey [1 ]
Jan Odenkirchen [2 ]
Adnan Brkic [1 ]
Birgid Schömig-Markiefka [1 ]
Alexander Quaas [1 ]
Reinhard Büttner [1 ]
Yuri Tolkach [1 ]
机构
[1] University Hospital Cologne,Institute of Pathology
[2] University of Cologne,Medical Faculty
来源
关键词
D O I
10.1038/s43856-025-00808-8
中图分类号
学科分类号
摘要
Pathology departments produce many diagnostic reports as free text, which is hard to analyze or use in research and computer projects. Converting this free text into more standard organized information like test results or diagnoses, makes it easier to use. This task often requires human experts and takes time. Large language models (LLMs), which are advanced computer systems designed to understand and generate human-like text, might simplify this process. Here, we tested six LLMs, including freely available models and the commercial GPT-4 model, using 579 pathology reports in English and German. Our results show that freely available models can perform as well as commercial, providing a cheaper solution while avoiding privacy concerns. The shared dataset will support future research in pathology data processing.
引用
收藏
相关论文
共 50 条
  • [31] USING LARGE LANGUAGE MODELS (LLMS) FOR DATA EXTRACTION IN LITERATURE REVIEWS: AN ENHANCED APPROACH
    Lambova, A.
    Matev, K.
    Gallinaro, J.
    Guerra, I
    Rtveladze, K.
    Caverly, S.
    VALUE IN HEALTH, 2024, 27 (12)
  • [32] GeDa: Improving training data with large language models for Aspect Sentiment Triplet Extraction
    Mai, Weixing
    Zhang, Zhengxuan
    Chen, Yifan
    Li, Kuntao
    Xue, Yun
    KNOWLEDGE-BASED SYSTEMS, 2024, 301
  • [33] Decomposing Relational Triple Extraction with Large Language Models for Better Generalization on Unseen Data
    Meng, Boyu
    Lin, Tianhe
    Yang, Deqing
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT IV, PAKDD 2024, 2024, 14648 : 104 - 115
  • [34] A Comprehensive Evaluation of Quantization Strategies for Large Language Models
    Jin, Renren
    Du, Jiangcun
    Huang, Wuwei
    Liu, Wei
    Lu, Jian
    Wang, Bin
    Xiong, Deyi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 12186 - 12215
  • [35] Generating Data for Symbolic Language with Large Language Models
    Ye, Jiacheng
    Li, Chengzu
    Kong, Lingpeng
    Yu, Tao
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 8418 - 8443
  • [36] Large language models for overcoming language barriers in obstetric anaesthesia: a structured assessment
    Lomas, A.
    Broom, M. A.
    INTERNATIONAL JOURNAL OF OBSTETRIC ANESTHESIA, 2024, 60
  • [37] Large language models for generative information extraction: a survey
    Xu, Derong
    Chen, Wei
    Peng, Wenjun
    Zhang, Chao
    Xu, Tong
    Zhao, Xiangyu
    Wu, Xian
    Zheng, Yefeng
    Wang, Yang
    Chen, Enhong
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (06)
  • [38] Revisiting Relation Extraction in the era of Large Language Models
    Wadhwa, Somin
    Amir, Silvio
    Wallace, Byron C.
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15566 - 15589
  • [39] Extraction of Subjective Information from Large Language Models
    Kobayashi, Atsuya
    Yamaguchi, Saneyasu
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 1612 - 1617
  • [40] Trend Extraction and Analysis via Large Language Models
    Soru, Tommaso
    Marshall, Jim
    18TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC 2024, 2024, : 285 - 288