A unified framework of medical information annotation and extraction for Chinese clinical text

被引:2
|
作者
Zhu, Enwei [1 ,2 ]
Sheng, Qilin [1 ]
Yang, Huanwan [1 ]
Liu, Yiyang [1 ,2 ]
Cai, Ting [1 ,2 ]
Li, Jinpeng [1 ,2 ]
机构
[1] Ningbo 2 Hosp, Ningbo 315010, Zhejiang, Peoples R China
[2] Univ Chinese Acad Sci, Ningbo Inst Life & Hlth Ind, Ningbo 315016, Zhejiang, Peoples R China
关键词
Information extraction; Annotation scheme; Electronic medical record; Chinese clinical text; NEURAL-NETWORKS; CORPUS;
D O I
10.1016/j.artmed.2023.102573
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical information extraction consists of a group of natural language processing (NLP) tasks, which collaboratively convert clinical text to pre-defined structured formats. This is a critical step to exploit electronic medical records (EMRs). Given the recent thriving NLP technologies, model implementation and performance seem no longer an obstacle, whereas the bottleneck locates on a high-quality annotated corpus and the whole engineering workflow. This study presents an engineering framework consisting of three tasks, i.e., medical entity recognition, relation extraction and attribute extraction. Within this framework, the whole workflow is demonstrated from EMR data collection through model performance evaluation. Our annotation scheme is designed to be comprehensive and compatible between the multiple tasks. With the EMRs from a general hospital in Ningbo, China, and the manual annotation by experienced physicians, our corpus is of large scale and high quality. Built upon this Chinese clinical corpus, the medical information extraction system show performance that approaches human annotation. The annotation scheme, (a subset of) the annotated corpus, and the code are all publicly released, to facilitate further research.
引用
收藏
页数:12
相关论文
共 50 条
  • [11] A Unified Smart Chinese Medicine Framework for Healthcare and Medical Services
    Zhang, Qingchen
    Bai, Changchuan
    Yang, Laurence T.
    Chen, Zhikui
    Li, Peng
    Yu, Hang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (03) : 882 - 890
  • [12] Named Entity Recognition via Unified Information Extraction Framework
    Chen, Xinyue
    Zhang, Zhenguo
    Lu, Xinghua
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE, CCAI 2024, 2024, : 308 - 313
  • [13] A tag based joint extraction model for Chinese medical text
    Liu, XingYu
    Liu, Yu
    Wu, HangYu
    Guan, QingQuan
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2021, 93
  • [14] An Automated Approach for Clinical Quantitative Information Extraction from Chinese Electronic Medical Records
    Liu, Shanshan
    Pan, Xiaoyi
    Chen, Boyu
    Gao, Dongfa
    Hao, Tianyong
    HEALTH INFORMATION SCIENCE (HIS 2018), 2018, 11148 : 98 - 109
  • [15] WEIGHT ANNOTATION IN INFORMATION EXTRACTION
    Doleschal, Johannes
    Kimelfeld, Benny
    Martens, Wim
    Peterfreund, Liat
    LOGICAL METHODS IN COMPUTER SCIENCE, 2020, 18 (01)
  • [16] Zero-Shot Information Extraction as a Unified Text-to-Triple Translation
    Wang, Chenguang
    Liu, Xiao
    Chen, Zui
    Hong, Haoyun
    Tang, Jie
    Song, Dawn
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1225 - 1238
  • [17] A System for Medical Information Extraction and Verification from Unstructured Text
    Juric, Damir
    Stoilos, Giorgos
    Melo, Andre
    Moore, Jonathan
    Khodadadi, Mohammad
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13314 - 13319
  • [18] MIDAS: An Information-Extraction Approach to Medical Text Classification
    Sotelsek-Margalef, Anastasia
    Villena-Roman, Julio
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 97 - 104
  • [19] JaMIE: A Pipeline Japanese Medical Information Extraction System with Novel Relation Annotation
    Cheng, Fei
    Yada, Shuntaro
    Tanaka, Ribeka
    Aramaki, Eiji
    Kurohashi, Sadao
    2022 Language Resources and Evaluation Conference, LREC 2022, 2022, : 3724 - 3731
  • [20] ANNO: A General Annotation Tool for Bilingual Clinical Note Information Extraction
    Lee, Kye Hwa
    Lee, Hyunsung
    Park, Jin-Hyeok
    Kim, Yi-Jun
    Lee, Youngho
    HEALTHCARE INFORMATICS RESEARCH, 2022, 28 (01) : 89 - 94