Two-stage learning algorithm for biomedical named entity recognition

被引:0
|
作者
Che X.-J. [1 ]
Xu H. [1 ]
Pan M.-Y. [1 ]
Liu Q.-L. [1 ]
机构
[1] College of Computer Science and Technology, Jilin University, Changchun
关键词
computer application; convolutional neural network; named entity recognition; natural language processing; pre-training; text representation;
D O I
10.13229/j.cnki.jdxbgxb.20211156
中图分类号
学科分类号
摘要
In order to solve the problem of high cost of labeling named entity data and difficulty in obtaining large amounts of labeled data in the biomedical field,this article proposes a two-stage learning framework to realize BioNER under low resources. In the first stage,Word2Vec and BERT are used as the basic model to pre-train and fine-tune to obtain the word embedding representation in a specific field;In the second stage,the generated word embedding representations are input to the neural network composed of BiLSTM and CRF and then used for the training of the final task. This paper conducts experiments on the Yidu-S4k dataset,and even in the case of a small number of labels,the results show that the algorithm in this paper achieves an accuracy of 80.94% and has great performance. © 2023 Editorial Board of Jilin University. All rights reserved.
引用
收藏
页码:2380 / 2387
页数:7
相关论文
共 25 条
  • [1] He Yu-jie, Du Fang, Shi Ying-jie, Et al., Review of named entity recognition based on deep learning, Computer Engineering and Application, 7, 11, pp. 21-36, (2021)
  • [2] Campos D, Matos S, Oliveira J L., Biomedical named entity recognition: a survey of machine-learning tools, Theory and Applications for Advanced Text Mining, 11, pp. 175-195, (2012)
  • [3] Shen J, Wang X, Li S, Et al., Exploiting rich features for Chinese named entity recognition, IEEE International Conference on Intelligent Systems and Knowledge Engineering, pp. 278-282, (2010)
  • [4] Soomro P D, Kumar S, Banbhrani A A S, Et al., Bio-NER: biomedical named entity recognition using rulebased and statistical learners, Int. J. Adv. Comput. Sci. Appl, 8, pp. 163-170, (2017)
  • [5] Durbin R, Eddy S R, Krogh A, Et al., Biological sequence analysis: multiple sequence alignment methods [J/OL]
  • [6] Zhang Y, Wang X, Hou Z, Et al., Clinical named entity recognition from Chinese electronic health records via machine learning methods (Preprint), JMIR Medical Informatics, 6, 4, (2018)
  • [7] Yan Yang, Wen Dun-wei, Wang Yun-ji, Et al., Named entity recognition in Chinese medical records based on cascaded conditional random field, Journal of Jilin University(Engineering and Technology Edition), 44, 6, pp. 1843-1848, (2014)
  • [8] Leaman R, Lu Z., TaggerOne: joint named entity recognition and normalization with semi-Markov models, Bioinformatics, 32, 18, pp. 2839-2846, (2016)
  • [9] Pan Guo-wei, Ji Jiu-ming, Li Nan, Et al., Research on Chinese chemical substance name recognition based on two types of statistical machine learning models, Modern Information, 31, 11, pp. 163-165, (2011)
  • [10] Cotterell R, Duh K., Low-resource named entity recognition with cross-lingual, character-level neural conditional random fields, Proceedings of the Eighth International Joint Conference on Natural Language Processing, pp. 91-96, (2017)