Data Augmentation for Chinese Clinical Named Entity Recognition

被引:0
|
作者
Wang P.-H. [1 ]
Li M.-Z. [1 ]
Li S. [1 ]
机构
[1] School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing
来源
Li, Si (lisi@bupt.edu.cn) | 1600年 / Beijing University of Posts and Telecommunications卷 / 43期
关键词
Data augmentation; Generative adversarial network; Named entity recognition;
D O I
10.13190/j.jbupt.2020-032
中图分类号
学科分类号
摘要
Chinese clinical named entity recognition plays an important role in recognizing medical entities contained in Chinese electronic medical records. Limited to lack of large annotated data, most of existing methods concentrate on employing external resources to improve the performance of clinical named entity recognition, which require lots of time and efficient rules. To solve the problem of lack of large annotated data, data augmentation using sequence adversarial generative network is used to generate more various data depending on entities and non-entities in the training set. Experiments show that when using generated data to expand training set, the proposed named entity recognition system has achieved competitive performance compared with state-of-art methods, which shows the effectiveness of our data augmentation method. © 2020, Editorial Department of Journal of Beijing University of Posts and Telecommunications. All right reserved.
引用
收藏
页码:84 / 90
页数:6
相关论文
共 17 条
  • [1] Dong Chuanhai, Zhang Jiajun, Zong Chengqing, Et al., Character based LSTM-CRF with radical-level features for Chinese named entity recognition, Natural Language Understanding and Intelligent Applications-5th Conference on Natural Language Processing and Chinese Computing(NLPCC), pp. 239-250, (2016)
  • [2] Ma Xuezhe, Hovy E., End-to-end sequence labeling via Bi-directional LSTM-CNNs-CRF, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1064-1074, (2016)
  • [3] Wang Xuan, Zhang Yu, Ren Xiang, Et al., Cross-type biomedical named entity recognition with deep multi-task learning, Bioinformatics, 35, 10, pp. 1745-1752, (2019)
  • [4] Li Luqi, Zhao Jie, Hou Li, Et al., An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records, BMC Med Inf & Decision Making, 19, 5, (2019)
  • [5] Wang Qi, Zhou Yangming, Tong Ruan, Et al., Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition, J Biomed Informatics, (2019)
  • [6] Cui Zongyong, Zhang Mingrui, Cao Zongjie, Et al., Image data augmentation for SAR sensor via generative adversarial nets, IEEE Access, 7, pp. 42255-42268, (2019)
  • [7] Yu Lantao, Zhang Weinan, Wang Jun, Et al., Sequence generative adversarial nets with policy gradient, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI), pp. 2852-2858, (2017)
  • [8] Zhang Suxiang, Qin Ying, Wen Juan, Et al., Word segmentation and named entity recognition for sighan bakeoff3, Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 158-161, (2006)
  • [9] Chen Aitao, Peng Fuchun, Shan Roy, Et al., Chinese named entity recognition with conditional probabilistic models, Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 173-176, (2006)
  • [10] Collobert R, Weston J, Bottou L, Et al., Natural language processing (almost) from scratch, Journal of Machine Learning Research, 12, pp. 2493-2537, (2011)