Noise Detection for Distant Supervised Named Entity Recognition

被引:0
|
作者
Wang J. [1 ]
Wang K. [1 ]
Wang H. [2 ]
Du W. [3 ]
He Z. [3 ]
Ruan T. [1 ]
Liu J. [1 ]
机构
[1] School of Information Science and Engineering, East China University of Science and Technology, Shanghai
[2] College of Design and Innovation, Tongji University, Shanghai
[3] DS Information Technology Co., Ltd., Shanghai
关键词
deep reinforcement learning; distant supervision; named entity recognition; noise detection; pre-training strategy;
D O I
10.7544/issn1000-1239.202220999
中图分类号
学科分类号
摘要
On distantly supervised named entity recognition (NER), there are many reinforcement learning based approaches, which exploit the powerful decision-making ability of reinforcement learning to detect noise from the automatically labeled data generated by distant supervision. However, the structures of the policy network models used are typically simple, which results in a weak ability to recognize noisy instances. Furthermore, correct instances are identified at sentence level, resulting in part of the useful information in the sentence being discarded. In this paper, we propose a new reinforcement learning based method for distantly supervised NER, named RLTL-DSNER, which can detect correct instances at token level from noisy data generated by distant supervision, proposing to reduce the negative impact of noisy instances on distantly supervised NER model. Specifically, we introduce a tag confidence function to identify correct instances accurately. In addition, we propose a novel pretraining strategy for the NER model. This strategy can provide accurate state representations and effective reward values for the initial training of the reinforcement learning model. The pre-training strategy can help guide it to update in the right direction. We conduct experiments on four datasets to verify the superiority of the RLTL-DSNER method, gaining 4.28% F1 improvement on NEWS dataset over state-of-the-art methods. © 2024 Science Press. All rights reserved.
引用
收藏
页码:916 / 928
页数:12
相关论文
共 38 条
  • [1] Li Dongmei, Zhang Yang, Li Dongyuan, Et al., Review of entity relation extraction methods, Journal of Computer Research and Development, 57, 7, pp. 1424-1448, (2020)
  • [2] Mutabazi E, Ni Jianjun, Tang Guangyi, Et al., A review on medical textual question answering systems based on deep learning approaches[J/OL], Applied Sciences, (2021)
  • [3] Hu Yu, Shen Derong, Nie Tiezheng, Et al., A joint learning method for biomedical entity linking, Chinese Journal of Computers, 45, 4, pp. 748-765, (2022)
  • [4] Yang Yuji, Xu Bin, Hu Jiawei, Et al., Accurate and efficient method for constructing domain knowledge graph, Journal of Software, 29, 10, pp. 2931-2947, (2018)
  • [5] Wang Meng, Wang Haofen, Li Bohan, Et al., Survey on key technologies of new generation knowledge graph, Journal of Computer Research and Development, 59, 9, pp. 1947-1965, (2022)
  • [6] Wang Fei, Liu Jingping, Liu Bin, Et al., Survey on construction of code knowledge graph and intelligent software development, Journal of Software, 31, 1, pp. 47-66, (2020)
  • [7] Souza F, Nogueira R, Lotufo R., Portuguese named entity recognition using BERT-CRF, (2019)
  • [8] Hao Fei, Ji Donghong, Li Bobo, Et al., Rethinking boundaries: End-to-end recognition of discontinuous mentions with pointer networks, Proc of the 35th AAAI Conf on Artificial Intelligence, pp. 12785-12793, (2021)
  • [9] Xie Chenhao, Liang Jiaqing, Liu Jingping, Et al., Revisiting the negative data of distantly supervised relation extraction, Proc of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing (ACL-IJCNLP), pp. 3572-3581, (2021)
  • [10] Lange L, Hedderich M A, Klakow D., Feature-dependent confusion matrices for low-resource NER labeling with noisy labels, Proc of the 2019 Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP), pp. 3554-3559, (2019)