A Fusion Model for Chinese Electronic Medical Record Parsing

被引:0
|
作者
Jiang Z.-P. [1 ,2 ]
Guan Y. [1 ]
机构
[1] Web Intelligence Laboratory, Language Technology Center, Harbin Institute of Technology, Harbin
[2] Changan Communication Technology Co., LTD, Beijing
来源
基金
中国国家自然科学基金;
关键词
Chinese electronic medical record (CEMR); Data-oriented parsing (DOP); Full parsing; Hierarchical parsing;
D O I
10.16383/j.aas.2018.c170219
中图分类号
学科分类号
摘要
Full parsing is an important structuring process of the natural language processing (NLP). However, its research on Chinese electronic medical record (CEMR) is currently a blank because of the lack of syntactical annotated corpus on CEMR. To make the best of the sub-language characteristic of strong pattern in CEMR, patterns reused is first formalized as tree fragment in CEMR, and a model integrating data-oriented parsing (DOP) and hierarchical parsing is proposed. In the extraction stage of tree fragments, we propose a more efficient standard tree fragment algorithm by solving repeated comparison of standard tree fragments, and a partial tree fragment extraction algorithm to substitute for the low-efficient quadratic tree kernel (QTK) algorithm to obtain a standard tree fragment set and a partial tree fragment set. Based on the two extracted tree fragment sets, a strategy matching word and part-of-speech (POS) synchronously and a maximal combination algorithm of tree fragments are proposed to improve DOP, and alleviate the noise caused by invalid tree fragments. Experimental results show that the fusion model based on DOP and hierarchical parsing can effectively improve the parsing effect for CEMR, and the F1 score reaches the highest 80.87 % based on a small number of annotated corpora, which is even 2 % higher than those of the two state-of-the-art parsers of Stanford and Berkeley in cross-department parsing. Copyright © 2019 Acta Automatica Sinica. All rights reserved.
引用
收藏
页码:276 / 288
页数:12
相关论文
共 19 条
  • [1] The basic specifications of electronic medical records (trial)
  • [2] Yang J.-F., Yu Q.-B., Guan Y., Jiang Z.-P., An overview of research on electronic medical record oriented named entity recognition and entity relation extraction, Acta Automatica Sinica, 40, 8, pp. 1537-1562, (2014)
  • [3] Jiang M., Huang Y., Fan J.W., Tang B.Z., Denny J.C., Xu H., Parsing clinical text: how good are the state-of-the-art parsers?, BMC Medical Informatics and Decision Making, 15, (2015)
  • [4] Stubbs A., Kotfila C., Xu H., Uzuner O., Identifying risk factors for heart disease over time: overview of 2014 i2b2/UTHealth shared task Track 2, Journal of Biomedical Informatics, 58, pp. S67-S77, (2015)
  • [5] Chen Y.K., Lask T.A., Mei Q.Z., Chen Q.X., Moon S., Wang J.Q., Nguyen K., Dawodu T., Cohen T., Denny J.C., Xu H., An active learning-enabled annotation system for clinical named entity recognition, BMC Medical Informatics and Decision Making, 17, (2017)
  • [6] Jiang Z.-P., Zhao F.-F., Guan Y., Yang J.-F., Research on Chinese electronic medical record oriented lexical corpus annotation, Chinese High Technology Letters, 24, 6, pp. 609-615, (2014)
  • [7] Petrov S., Klein D., Improved inference for unlexicalized parsing, Proceedings of the 2007 Human Language Technologies: the Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 404-411, (2007)
  • [8] Klein D., Manning C.D., Fast exact inference with a factored model for natural language parsing, Proceedings of the 2003 Advances in Neural Information Processing Systems, pp. 3-10, (2003)
  • [9] Bod R., A computational model of language performance: data oriented parsing, Proceedings of the 14th Conference on Computational Linguistics, 3, pp. 855-859, (1992)
  • [10] Zhang Y.-J., Zhu J.-B., Zhang Y., Yao T.-S., Implementing Chinese parsing based on DOP technique, Journal of Chinese Information Processing, 14, 1, pp. 13-21, (2000)