A Web Semantic-Based Text Analysis Approach for Enhancing Named Entity Recognition Using PU-Learning and Negative Sampling

被引:0
|
作者
Zhang, Shunqin [1 ]
Zhang, Sanguo [1 ]
He, Wenduo [2 ]
Zhang, Xuan [2 ]
机构
[1] Univ Chinese Acad Sci, Sch Math Sci, Beijing, Peoples R China
[2] Tsinghua Univ, Inst Network Sci & Cyberspace INSC, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Negative Sampling; NER; PU-Learning; Robustness; Self-Denoising; Token-Level; Two-Step Procedure; Unlabeled Entity Problem; CLASSIFICATION;
D O I
10.4018/IJSWIS.335113
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The NER task is largely developed based on well-annotated data. However, in many scenarios, the entities may not be fully annotated, leading to serious performance degradation. To address this issue, the authors propose a robust NER approach that combines a novel PU-learning algorithm and negative sampling. Unlike many existing studies, the proposed method adopts a two-step procedure for handling unlabeled entities, thereby enhancing its capability to mitigate the impact of such entities. Moreover, this algorithm demonstrates high versatility and can be integrated into any token-level NER model with ease. The effectiveness of the proposed method is verified on several classic NER models and datasets, demonstrating its strong ability to handle unlabeled entities. Finally, the authors achieve competitive performances on synthetic and real-world datasets.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Text Summarization based Named Entity Recognition for Certain Application using BERT
    Tummala, Indira Priyadarshini
    2024 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT CYBER PHYSICAL SYSTEMS AND INTERNET OF THINGS, ICOICI 2024, 2024, : 1136 - 1141
  • [22] Chinese Named Entity Recognition for Hazard And Operability Analysis Text Based on Albert
    Wang, Zhenhua
    Zhang, Beike
    Gao, Dong
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5641 - 5645
  • [23] Semantic-Based Text Document Clustering Using Cognitive Semantic Learning and Graph Theory
    Ali, Ismael
    Melton, Austin
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 243 - 247
  • [24] Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning
    He, Hang
    Ma, Chao
    Ye, Shan
    Tang, Wenqiang
    Zhou, Yuxuan
    Yu, Zhen
    Yi, Jiaxin
    Hou, Li
    Hou, Mingcai
    JOURNAL OF EARTH SCIENCE, 2024, 35 (03) : 1035 - 1043
  • [25] Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning
    Hang He
    Chao Ma
    Shan Ye
    Wenqiang Tang
    Yuxuan Zhou
    Zhen Yu
    Jiaxin Yi
    Li Hou
    Mingcai Hou
    Journal of Earth Science, 2024, 35 (03) : 1035 - 1043
  • [26] Research on medical text named entity recognition based on Two-stage approach
    Sun, Fuquan
    Xu, Ximeng
    Dong, Xinyi
    PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CYBER SECURITY, ARTIFICIAL INTELLIGENCE AND DIGITAL ECONOMY, CSAIDE 2024, 2024, : 365 - 369
  • [27] Arabic Location Named Entity Recognition for Tweets using a Deep Learning Approach
    Alzaidi, Bedour Swayelh
    Abushark, Yoosef
    Khan, Asif Irshad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 76 - 83
  • [28] SEMANTIC-BASED SENTENCE RECOGNITION IN IMAGES USING BIMODAL DEEP LEARNING
    Zheng, Yi
    Wang, Qitong
    Betke, Margrit
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2753 - 2757
  • [29] Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition
    Agrawal, Ankit
    Tripathi, Sarsij
    Vardhan, Manu
    COMPUTING, 2023, 105 (05) : 979 - 997
  • [30] Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition
    Ankit Agrawal
    Sarsij Tripathi
    Manu Vardhan
    Computing, 2023, 105 : 979 - 997