A Web Semantic-Based Text Analysis Approach for Enhancing Named Entity Recognition Using PU-Learning and Negative Sampling

被引:0
|
作者
Zhang, Shunqin [1 ]
Zhang, Sanguo [1 ]
He, Wenduo [2 ]
Zhang, Xuan [2 ]
机构
[1] Univ Chinese Acad Sci, Sch Math Sci, Beijing, Peoples R China
[2] Tsinghua Univ, Inst Network Sci & Cyberspace INSC, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Negative Sampling; NER; PU-Learning; Robustness; Self-Denoising; Token-Level; Two-Step Procedure; Unlabeled Entity Problem; CLASSIFICATION;
D O I
10.4018/IJSWIS.335113
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The NER task is largely developed based on well-annotated data. However, in many scenarios, the entities may not be fully annotated, leading to serious performance degradation. To address this issue, the authors propose a robust NER approach that combines a novel PU-learning algorithm and negative sampling. Unlike many existing studies, the proposed method adopts a two-step procedure for handling unlabeled entities, thereby enhancing its capability to mitigate the impact of such entities. Moreover, this algorithm demonstrates high versatility and can be integrated into any token-level NER model with ease. The effectiveness of the proposed method is verified on several classic NER models and datasets, demonstrating its strong ability to handle unlabeled entities. Finally, the authors achieve competitive performances on synthetic and real-world datasets.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] PRODUCT MARKET DEMAND ANALYSIS USING NLP IN BANGLISH TEXT WITH SENTIMENT ANALYSIS AND NAMED ENTITY RECOGNITION
    Hossain, Md Sabbir
    Nayla, Nishat
    Rassel, Annajiat Alim
    2022 56TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2022, : 166 - 171
  • [42] Blog Text Analysis Using Topic Modeling, Named Entity Recognition and Sentiment Classifier Combine
    Waila, Pranav
    Singh, V. K.
    Singh, M. K.
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 1166 - 1171
  • [43] Named Entity Recognition in Bengali Text Using Merged Hidden Markov Model and Rule Base Approach
    Drovo, Mah Dian
    Chowdhury, Moithri
    Uday, Saiful Islam
    Das, Amit Kumar
    2019 7TH INTERNATIONAL CONFERENCE ON SMART COMPUTING & COMMUNICATIONS (ICSCC), 2019, : 18 - 22
  • [44] Deep Learning-Based Named Entity Recognition System Using Hybrid Embedding
    Goyal, Archana
    Gupta, Vishal
    Kumar, Manish
    CYBERNETICS AND SYSTEMS, 2024, 55 (02) : 279 - 301
  • [45] A Feature Based Simple Machine Learning Approach with Word Embeddings to Named Entity Recognition on Tweets
    Taspinar, Mete
    Ganiz, Murat Can
    Acarman, Tankut
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017, 2017, 10260 : 254 - 259
  • [46] Integrated Deep Learning with Attention Layer Based Approach for Precise Biomedical Named Entity Recognition
    Pooja, H.
    Jagadeesh, Prabhudev M. P.
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2024, 15 (06) : 704 - 713
  • [47] Enhancing Red Tide Image Recognition using Hierarchical Learning Approach based on Semantic Feature
    Park, Sun
    Cha, ByungRae
    Kim, JongWon
    ICECC 2019: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND CONTROL ENGINEERING, 2019, : 25 - 28
  • [48] BERT-Based Transfer-Learning Approach for Nested Named-Entity Recognition Using Joint Labeling
    Agrawal, Ankit
    Tripathi, Sarsij
    Vardhan, Manu
    Sihag, Vikas
    Choudhary, Gaurav
    Dragoni, Nicola
    APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [49] Rule based Approach for Text Segmentation on Indonesian News Article using Named Entity Distribution
    Saniati
    Purwarianti, Ayu
    2014 International Conference on Data and Software Engineering (ICODSE), 2014,
  • [50] Enhancing Cross-Lingual Named Entity Recognition via Dual Contrastive Learning Based on MRC Framework
    Zhuo, Aiqing
    Shi, Kunli
    Gu, Jinghang
    Qian, Longhua
    Zhoul, Guodong
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 122 - 134