A lexical and syntactic analysis system for chinese electronic medical record

被引:0
|
作者
Jiang, Zhipeng [1 ]
Dai, Xue [1 ]
Guan, Yi [1 ]
Zhao, Fangfang [1 ]
机构
[1] Department of Computer Science and Technology, Harbin Institute of Technology, Harbin,150001, China
关键词
CEMR - Chinese word segmentation - Full parsing - Part of speech tagging - Shallow parsing;
D O I
10.14257/ijunesst.2016.9.9.29
中图分类号
学科分类号
摘要
Lexical and syntactic analysis, including word segmentation, part-of-speech (POS) tagging, shallow parsing and full parsing, are essential for medical language processing (MLP). However, research on full parsing, even shallow parsing and POS tagging for Chinese electronic medical record (CEMR), has not been carried out because of the lack of annotated corpus on CEMR. In this paper, we built a corpus of 5,024 sentences from CEMR with word segmentation, POS tags and phrase tags, of them, 2,553 are annotated as full parsing trees. Inter-annotator agreement results: Chinese word segmentation (97.56%), POS tagging (93.34%), shallow parsing (96.5%), full parsing (91.22%). A lexical and syntactic analysis system for CEMR is developed and evaluated based on above corpus. Of its components, we proposed a joint model for word segmentation and POS tagging with the transformation-based error-driven model as correction postprocessing to alleviate the problem of error accumulation, the F1-score of word segmentation and POS tagging were 94.39% and 93.2%, respectively. A shallow parsing model under the framework of group learning we proposed was developed, which enriched word features by word embedding from large unlabeled CEMRs and achieved the F1-score of 96.3%. At last, we presented a state-of-art full parser combining the Berkeley parser and the Stanford parser to outperform the best single parser by 3.68%. The evaluation results show a substantial benefit to statistical machine learning models from the annotated CEMR. These works are the foundation for natural language processing (NLP) technologies applied to CEMR. © 2016 SERSC.
引用
收藏
页码:305 / 318
相关论文
共 50 条
  • [31] A hybrid approach for named entity recognition in Chinese electronic medical record
    Ji, Bin
    Liu, Rui
    Li, Shasha
    Yu, Jie
    Wu, Qingbo
    Tan, Yusong
    Wu, Jiaju
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (Suppl 2)
  • [32] Effects of electronic medical record in a Chinese hospital: A time series study
    Xue, Yajiong
    Liang, Huigang
    Wu, Xiaocheng
    Gong, Hai
    Li, Bin
    Zhang, Yuxia
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2012, 81 (10) : 683 - 689
  • [33] Artificial Intelligence for Screening Chinese Electronic Medical Record and Biobank Information
    Li, Xiaoqing
    Han, Jiang
    Zhang, Shaodian
    Chen, Ken
    Zhao, Liebin
    He, Yi
    Liu, Shijian
    BIOPRESERVATION AND BIOBANKING, 2021, 19 (05) : 386 - 393
  • [34] Chinese Named Entity Recognition Fusing Lexical and Syntactic Information
    Zhang, Min
    Li, Bicheng
    Liu, Qilong
    Wu, Jing
    6TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE, ICIAI2022, 2022, : 69 - 77
  • [35] Standard Obstetric Record Charting System: Evaluation of a new electronic medical record
    Nielsen, PE
    Thomson, BA
    Jackson, RB
    Kosman, K
    Kiley, KC
    OBSTETRICS AND GYNECOLOGY, 2000, 96 (06): : 1003 - 1008
  • [36] Primary Care Physicians' Use of an Electronic Medical Record System: A Cognitive Task Analysis
    Shachak, Aviv
    Hadas-Dayagi, Michal
    Ziv, Amitai
    Reis, Shmuel
    JOURNAL OF GENERAL INTERNAL MEDICINE, 2009, 24 (03) : 341 - 348
  • [37] Primary Care Physicians’ Use of an Electronic Medical Record System: A Cognitive Task Analysis
    Aviv Shachak
    Michal Hadas-Dayagi
    Amitai Ziv
    Shmuel Reis
    Journal of General Internal Medicine, 2009, 24 : 341 - 348
  • [38] Cost-Benefit Analysis of Electronic Medical Record System at a Tertiary Care Hospital
    Choi, Jong Soo
    Lee, Woo Baik
    Rhee, Poong-Lyul
    HEALTHCARE INFORMATICS RESEARCH, 2013, 19 (03) : 205 - 214
  • [39] Grammatical Analysis of Languages with Lexical and Syntactic Ambiguities
    Lapshin, V. A.
    AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2012, 46 (01) : 50 - 58
  • [40] Syntactic approximation using iterative lexical analysis
    Cox, A
    Clarke, C
    IWPC 2003: 11TH IEEE INTERNATIONAL WORKSHOP ON PROGRAM COMPREHENSION, 2003, : 154 - 163