A unified framework of medical information annotation and extraction for Chinese clinical text

被引:2
|
作者
Zhu, Enwei [1 ,2 ]
Sheng, Qilin [1 ]
Yang, Huanwan [1 ]
Liu, Yiyang [1 ,2 ]
Cai, Ting [1 ,2 ]
Li, Jinpeng [1 ,2 ]
机构
[1] Ningbo 2 Hosp, Ningbo 315010, Zhejiang, Peoples R China
[2] Univ Chinese Acad Sci, Ningbo Inst Life & Hlth Ind, Ningbo 315016, Zhejiang, Peoples R China
关键词
Information extraction; Annotation scheme; Electronic medical record; Chinese clinical text; NEURAL-NETWORKS; CORPUS;
D O I
10.1016/j.artmed.2023.102573
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical information extraction consists of a group of natural language processing (NLP) tasks, which collaboratively convert clinical text to pre-defined structured formats. This is a critical step to exploit electronic medical records (EMRs). Given the recent thriving NLP technologies, model implementation and performance seem no longer an obstacle, whereas the bottleneck locates on a high-quality annotated corpus and the whole engineering workflow. This study presents an engineering framework consisting of three tasks, i.e., medical entity recognition, relation extraction and attribute extraction. Within this framework, the whole workflow is demonstrated from EMR data collection through model performance evaluation. Our annotation scheme is designed to be comprehensive and compatible between the multiple tasks. With the EMRs from a general hospital in Ningbo, China, and the manual annotation by experienced physicians, our corpus is of large scale and high quality. Built upon this Chinese clinical corpus, the medical information extraction system show performance that approaches human annotation. The annotation scheme, (a subset of) the annotated corpus, and the code are all publicly released, to facilitate further research.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Synthetic data for annotation and extraction of family history information from clinical text
    Pål H. Brekke
    Taraka Rama
    Ildikó Pilán
    Øystein Nytrø
    Lilja Øvrelid
    Journal of Biomedical Semantics, 12
  • [2] Synthetic data for annotation and extraction of family history information from clinical text
    Brekke, Pal H.
    Rama, Taraka
    Pilan, Ildiko
    Nytro, Oystein
    Ovrelid, Lilja
    JOURNAL OF BIOMEDICAL SEMANTICS, 2021, 12 (01)
  • [3] A unified framework for text analysis in Chinese TTS
    Fu, Guohong
    Zhang, Min
    Zhou, GuoDong
    Luke, Kang-Kwong
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 200 - +
  • [4] Relation Extraction From Biomedical and Clinical Text: Unified Multitask Learning Framework
    Yadav, Shweta
    Ramesh, Srivastsa
    Saha, Sriparna
    Ekbal, Asif
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (02) : 1105 - 1116
  • [5] LLMs Accelerate Annotation for Medical Information Extraction
    Goel, Akshay
    Gueta, Almog
    Gilon, Omry
    Liu, Chang
    Erell, Sofia
    Lan Huong Nguyen
    Hao, Xiaohong
    Jaber, Bolous
    Reddy, Shashir
    Kartha, Rupesh
    Steiner, Jean
    Laish, Itay
    Feder, Amir
    MACHINE LEARNING FOR HEALTH, ML4H, VOL 225, 2023, 225 : 82 - 100
  • [6] A unified framework for data modeling on Medical Information Systems
    Neves, J
    Cortez, P
    Rocha, M
    Abelha, A
    Machado, J
    Alves, V
    Basto, S
    Botelho, H
    Neves, J
    MEDICAL INFORMATICS EUROPE '99, 1999, 68 : 68 - 71
  • [7] Research on entity relation extraction for Chinese medical text
    Lu, Yonghe
    Chen, Hongyu
    Zhang, Yueyun
    Peng, Jiahui
    Xiang, Dingcheng
    Zhang, Jinxia
    HEALTH INFORMATICS JOURNAL, 2024, 30 (03)
  • [8] A Text Structuring Method for Chinese Medical Text Based on Temporal Information
    Zhang, Runtong
    Chu, Fuzhi
    Chen, Donghua
    Shang, Xiaopu
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2018, 15 (03)
  • [9] Information Extraction Models for German Clinical Text
    Roller, Roland
    Seiffe, Laura
    Ayach, Ammer
    Moller, Sebastian
    Marten, Oliver
    Mikhailov, Michael
    Alt, Christoph
    Schmidt, Danilo
    Halleck, Fabian
    Naik, Marcel
    Duettmann, Wiebke
    Budde, Klemens
    2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020), 2020, : 527 - 528
  • [10] Extraction of lexico-syntactic information and acquisition of causality schemes for text annotation
    Alamarguy, L
    Dieng-Kuntz, R
    Faron-Zucker, C
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2005, 3683 : 1180 - 1186