Nested Entity Recognition Approach in Chinese Medical Text

被引:0
|
作者
Yan J.-H. [1 ]
Zong C.-Q. [1 ,2 ]
Xu J.-A. [1 ]
机构
[1] School of Computer and Information Technology, Beijing Jiaotong University, Beijing
[2] National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2024年 / 35卷 / 06期
关键词
boundary detection; Chinese text; entity recognition; medical field; nested entity recognition;
D O I
10.13328/j.cnki.jos.006927
中图分类号
学科分类号
摘要
Entity recognition is a key technology for information extraction. Compared with ordinary text, the entity recognition of Chinese medical text is often faced with a large number of nested entities. Previous methods of entity recognition often ignore the entity nesting rules unique to medical text and directly use sequence annotation methods. Therefore, a Chinese entity recognition method that incorporates entity nesting rules is proposed. This method transforms the entity recognition task into a joint training task of entity boundary recognition and boundary first-tail relationship recognition in the training process and filters the results by combining the entity nesting rules summarized from actual medical text in the decoding process. In this way, the recognition results are in line with the composition law of the nested combinations of inner and outer entities in the actual text. Good results have been achieved in public experiments on entity recognition of medical text. Experiments on the dataset show that the proposed method is significantly superior to the existing methods in terms of nested-type entity recognition performance, and the overall accuracy is increased by 0.5% compared with the state-of-the-art methods. © 2024 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2923 / 2935
页数:12
相关论文
共 34 条
  • [1] Cowie MR, Blomster JI, Curtis LH, Duclaux S, Ford I, Fritz F, Goldman S, Janmohamed S, Kreuzer J, Leenay M, Michel A, Ong S, Pell JP, Southworth MR, Stough WG, Thoenes M, Zannad F, Zalewski A., Electronic health records to facilitate clinical research, Clinical Research in Cardiology, 106, 1, pp. 1-9, (2017)
  • [2] Denaxas SC, Morley KI., Big biomedical data and cardiovascular disease research: Opportunities and challenges, European Heart Journal-quality of Care and Clinical Outcomes, 1, 1, pp. 9-16, (2015)
  • [3] Li I, Pan J, Goldwasser J, Verma N, Wong WP, Nuzumlali MY, Rosand B, Li YX, Zhang M, Chang D, Taylor RA, Krumholz HM, Radev D., Neural natural language processing for unstructured data in electronic health records: A review, Computer Science Review, 46, (2022)
  • [4] Li M, Xiang L, Kang XM, Zhao Y, Zhou Y, Zong CQ., Medical term and status generation from Chinese clinical dialogue with multigranularity transformer, IEEE/ACM Trans. on Audio, Speech, and Language Processing, 29, pp. 3362-3374, (2021)
  • [5] Sun J, Zhou Y, Zong CQ., One-shot relation learning for knowledge graphs via neighborhood aggregation and paths encoding, ACM Trans. on Asian and Low-resource Language Information Processing, 21, 3, (2021)
  • [6] Zhou YH., About modern Chinese morphemes, Journal of Southwest University for Nationalities (Philosophy and Social Sciences), 22, 7, pp. 202-205, (2001)
  • [7] Chiu JPC, Nichols E., Named entity recognition with bidirectional LSTM-CNNs, Trans. of the Association for Computational Linguistics, 4, pp. 357-370, (2016)
  • [8] Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C., Neural architectures for named entity recognition, Proc. of the 2016 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 260-270, (2016)
  • [9] Dong CH, Zhang JJ, Zong CQ, Hattori M, Di H., Character-based LSTM-CRF with radical-level features for Chinese named entity recognition, Proc. of th 5th CCF Conf. on Natural Language Processing and Chinese Computing (NLPCC 2016), and the 24th Int’l Conf. on Computer Processing of Oriental Languages, pp. 239-250, (2016)
  • [10] Huang ZH, Xu W, Yu K., Bidirectional LSTM-CRF models for sequence tagging, (2015)