Adaptive entity extraction method based on distant supervision

被引:0
|
作者
Ge L. [1 ]
Zhang Y. [2 ]
Li W. [2 ]
机构
[1] School of Computer Science and Technology, University of Science and Technology of China, Hefei
[2] School of Software and Microelectronics, Peking University, Beijing
关键词
A bidirectional long short-term memory neural network; Deep learning; Domain-specific knowledge graph; Entity extraction; Knowledge graph building; Ontology design; Positive unlabeled learning; Remote supervision;
D O I
10.11990/jheu.202011020
中图分类号
学科分类号
摘要
The traditional domain knowledge entity extraction algorithm mainly depends on the professional knowledge of experts, which requires a large amount of annotation workload and is difficult to apply in new fields. To solve this problem, this paper proposes an entity extraction algorithm based on remote supervision and applies it to the field of grain and oil storage. Under the framework of positive unlabeled learning, the algorithm performs entity extraction through two stages of entity determination and entity classification. First, a bidirectional Long Short-Term Memory neural network(BiLSTM) was used for two-class entity identification. Second, the fully connected network was used for entity type identification. Finally, the algorithm was used to extract entities to construct a knowledge graph in the field of grain and oil storage, which verified the feasibility of the algorithm. This algorithm is suitable for entity extraction tasks with few training entity samples and reduces the corpus size required for the BiLSTM-based algorithm entity extraction. Moreover, it achieves comparable results to those of the classical BiLSTM-based algorithm. Copyright ©2022 Journal of Harbin Engineering University.
引用
收藏
页码:564 / 571
页数:7
相关论文
共 27 条
  • [11] SUCHANEK F M, WEIKUM G., Knowledge bases in the age of big data analytics, Proceedings of the VLDB Endowment, 7, 13, pp. 1713-1714, (2014)
  • [12] BERNERS-LEE T, HENDLER J, LASSILA O., The semantic web, Scientific American, 284, 5, pp. 34-43, (2001)
  • [13] BOLLACKER K, EVANS C, PARITOSH P, Et al., Freebase: a collaboratively created graph database for structuring human knowledge, SIGMOD'08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1247-1250, (2008)
  • [14] SUCHANEK F M, KASNECI G, WEIKUM G., Yago: a core of semantic knowledge, Proceedings of the 16th international conference on World Wide Web-WWW'07, pp. 697-706, (2007)
  • [15] CARLSON A, BETTERIDGE J, KISIE B, Et al., Toward an architecture for never-ending language learning, Twenty-Fourth AAAI Conference on Artificial Intelligence, (2010)
  • [16] ASHBURNER M, BALL C A, BLAKE J A, Et al., Gene Ontology: tool for the unification of biology, Nature genetics, 25, 1, pp. 25-29, (2000)
  • [17] WANG RuoLan, Grain and Oil Storage Science (2nd ed), pp. 1-553, (2016)
  • [18] pp. 1-486, (2016)
  • [19] RYUICHI Kiryo, NIU Gang, SUGIYAMA Masashi, Et al., Positive-unlabeled learning with non-negative risk estimator, Advances in neural information processing systems, pp. 1675-1685, (2017)
  • [20] BEKKER J, DAVIS J., Learning from positive and unlabeled data: a survey, Machine learning, 109, 4, pp. 719-760, (2020)