A lexical and syntactic analysis system for chinese electronic medical record

被引：0

作者：

Jiang, Zhipeng ^{[1
]}

Dai, Xue ^{[1
]}

Guan, Yi ^{[1
]}

Zhao, Fangfang ^{[1
]}

机构：

[1] Department of Computer Science and Technology, Harbin Institute of Technology, Harbin,150001, China

来源：

International Journal of u- and e- Service, Science and Technology | 2016年 / 9卷 / 09期

关键词：

CEMR - Chinese word segmentation - Full parsing - Part of speech tagging - Shallow parsing;

D O I：

10.14257/ijunesst.2016.9.9.29

中图分类号：

学科分类号：

摘要：

Lexical and syntactic analysis, including word segmentation, part-of-speech (POS) tagging, shallow parsing and full parsing, are essential for medical language processing (MLP). However, research on full parsing, even shallow parsing and POS tagging for Chinese electronic medical record (CEMR), has not been carried out because of the lack of annotated corpus on CEMR. In this paper, we built a corpus of 5,024 sentences from CEMR with word segmentation, POS tags and phrase tags, of them, 2,553 are annotated as full parsing trees. Inter-annotator agreement results: Chinese word segmentation (97.56%), POS tagging (93.34%), shallow parsing (96.5%), full parsing (91.22%). A lexical and syntactic analysis system for CEMR is developed and evaluated based on above corpus. Of its components, we proposed a joint model for word segmentation and POS tagging with the transformation-based error-driven model as correction postprocessing to alleviate the problem of error accumulation, the F1-score of word segmentation and POS tagging were 94.39% and 93.2%, respectively. A shallow parsing model under the framework of group learning we proposed was developed, which enriched word features by word embedding from large unlabeled CEMRs and achieved the F1-score of 96.3%. At last, we presented a state-of-art full parser combining the Berkeley parser and the Stanford parser to outperform the best single parser by 3.68%. The evaluation results show a substantial benefit to statistical machine learning models from the annotated CEMR. These works are the foundation for natural language processing (NLP) technologies applied to CEMR. © 2016 SERSC.

引用

页码：305 / 318

共 50 条

[31] A hybrid approach for named entity recognition in Chinese electronic medical record
Ji, Bin
Liu, Rui
Li, Shasha
Yu, Jie
Wu, Qingbo
Tan, Yusong
Wu, Jiaju
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (Suppl 2)
[32] Effects of electronic medical record in a Chinese hospital: A time series study
Xue, Yajiong
Liang, Huigang
Wu, Xiaocheng
Gong, Hai
Li, Bin
Zhang, Yuxia
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2012, 81 (10) : 683 - 689
[33] Artificial Intelligence for Screening Chinese Electronic Medical Record and Biobank Information
Li, Xiaoqing
Han, Jiang
Zhang, Shaodian
Chen, Ken
Zhao, Liebin
He, Yi
Liu, Shijian
BIOPRESERVATION AND BIOBANKING, 2021, 19 (05) : 386 - 393
[34] Chinese Named Entity Recognition Fusing Lexical and Syntactic Information
Zhang, Min
Li, Bicheng
Liu, Qilong
Wu, Jing
6TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE, ICIAI2022, 2022, : 69 - 77
[35] Standard Obstetric Record Charting System: Evaluation of a new electronic medical record
Nielsen, PE
Thomson, BA
Jackson, RB
Kosman, K
Kiley, KC
OBSTETRICS AND GYNECOLOGY, 2000, 96 (06): : 1003 - 1008
[36] Primary Care Physicians' Use of an Electronic Medical Record System: A Cognitive Task Analysis
Shachak, Aviv
Hadas-Dayagi, Michal
Ziv, Amitai
Reis, Shmuel
JOURNAL OF GENERAL INTERNAL MEDICINE, 2009, 24 (03) : 341 - 348
[37] Primary Care Physicians’ Use of an Electronic Medical Record System: A Cognitive Task Analysis
Aviv Shachak
Michal Hadas-Dayagi
Amitai Ziv
Shmuel Reis
Journal of General Internal Medicine, 2009, 24 : 341 - 348
[38] Cost-Benefit Analysis of Electronic Medical Record System at a Tertiary Care Hospital
Choi, Jong Soo
Lee, Woo Baik
Rhee, Poong-Lyul
HEALTHCARE INFORMATICS RESEARCH, 2013, 19 (03) : 205 - 214
[39] Grammatical Analysis of Languages with Lexical and Syntactic Ambiguities
Lapshin, V. A.
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2012, 46 (01) : 50 - 58
[40] Syntactic approximation using iterative lexical analysis
Cox, A
Clarke, C
IWPC 2003: 11TH IEEE INTERNATIONAL WORKSHOP ON PROGRAM COMPREHENSION, 2003, : 154 - 163

← 1 2 3 4 5 →