Semi-supervised geological disasters named entity recognition using few labeled data

被引:8
|
作者
Lei, Xinya [1 ,2 ]
Song, Weijing [1 ,2 ]
Fan, Runyu [1 ,2 ]
Feng, Ruyi [1 ,2 ]
Wang, Lizhe [1 ,2 ]
机构
[1] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
[2] Hubei Key Lab Intelligent Geoinformat Proc, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Geological disasters named entity recognition; Semi-supervised learning; Self-training; Pre-trained BERT model; Named entity recognition;
D O I
10.1007/s10707-022-00474-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The geological disasters Named Entity Recognition (NER) method aims to recognize entities reflecting disaster event information in unstructured texts to construct a geohazard knowledge graph that can provide a reference for disaster emergency response. Without training on large-scale labeled data, current NER methods based on deep learning models cannot identify specific geological disaster entities from geological disaster situation reports. However, manually labeling geohazard situation reports is tedious and time-consuming. As a result, we present Semi-GDNER, a semi-supervised geological disasters NER approach that can effectively extract six kinds of geological disaster entities when a few manually labeled and unlabeled in-domain data are available. It is divided into two stages: (1) transferring the parameters of the pre-trained BERT-base model to the BERT layer of the backbone model BERT-BiLSTM-CRF and training the backbone model with a few labeled data; (2) continuing training the backbone model by expanding the training set with unlabeled data using a self-training (ST) strategy. To reduce noise in the second stage, we select the pseudo-labeled samples with high confidence to join the training set in each ST iteration. Experiments on our constructed Geological Disaster NER data show that our approach achieves a higher F1 (0.88) than other NER approaches (including five supervised NER approaches and a semi-supervised NER approach using the ST strategy of expanding the training set with all pseudo-labeled data), demonstrating the effectiveness of our approach. Furthermore, experiments on four general Chinese NER datasets show that the framework of our approach is transferable.
引用
收藏
页码:263 / 288
页数:26
相关论文
共 50 条
  • [31] Semi-Supervised Noisy Label Learning for Chinese Clinical Named Entity Recognition附视频
    Zhucong Li
    Zhen Gan
    Baoli Zhang
    Yubo Chen
    Jing Wan
    Kang Liu
    Jun Zhao
    Shengping Liu
    Data Intelligence, 2021, (03) : 389 - 401
  • [32] Semi-Supervised Stream Clustering Using Labeled Data Points
    Treechalong, Kritsana
    Rakthanmanon, Thanawin
    Waiyamai, Kitsana
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, MLDM 2015, 2015, 9166 : 281 - 295
  • [33] Semi-Supervised Entity Recognition of Chinese Government Document
    Chen, Dagang
    Li, Zeyuan
    Li, Zesong
    Liu, Kunnan
    Song, Yajun
    Wang, Peng
    2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND PATTERN RECOGNITION (AIPR 2019), 2019, : 145 - 149
  • [34] Semi-Supervised Approach to Named Entity Recognition in Spanish Applied to a Real-World Conversational System
    Martinez, Victor R.
    Eduardo Perez, Luis
    Iacobelli, Francisco
    Suarez Bojorquez, Salvador
    Gonzalez, Victor M.
    PATTERN RECOGNITION (MCPR 2015), 2015, 9116 : 224 - 235
  • [35] A Unified Model for Cross-Domain and Semi-Supervised Named Entity Recognition in Chinese Social Media
    He, Hangfeng
    Sun, Xu
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3216 - 3222
  • [36] Semi-supervised deep learning based named entity recognition model to parse education section of resumes
    Bodhvi Gaur
    Gurpreet Singh Saluja
    Hamsa Bharathi Sivakumar
    Sanjay Singh
    Neural Computing and Applications, 2021, 33 : 5705 - 5718
  • [37] Semi-supervised deep learning based named entity recognition model to parse education section of resumes
    Gaur, Bodhvi
    Saluja, Gurpreet Singh
    Sivakumar, Hamsa Bharathi
    Singh, Sanjay
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (11): : 5705 - 5718
  • [38] Semi-Supervised Classification of Network Data Using Very Few Labels
    Lin, Frank
    Cohen, William W.
    2010 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2010), 2010, : 192 - 199
  • [39] Lao Named Entity Recognition based on semi-supervised cascaded Conditional Random Fields with generalized expectation criteria
    Yang, Mengjie
    Zhou, Lanjiang
    Yu, Zhengtao
    Wang, Hongbin
    Journal of Computational Information Systems, 2015, 11 (20): : 7595 - 7606
  • [40] SEMI-SUPERVISED HIERARCHY LEARNING USING MULTIPLE-LABELED DATA
    Javadi, Ailar
    Gray, Alexander
    Anderson, David
    Berisha, Visar
    2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,