Active learning approach using a modified least confidence sampling strategy for named entity recognition

被引:19
|
作者
Agrawal, Ankit [1 ]
Tripathi, Sarsij [2 ]
Vardhan, Manu [1 ]
机构
[1] Natl Inst Technol Raipur, Dept Comp Sci & Engn, Raipur, Chhattisgarh, India
[2] Motilal Nehru Natl Inst Technol Allahabad, Dept Comp Sci & Engn, Prayagraj, Uttar Pradesh, India
关键词
Named entity recognition; Active learning; Least confidence; Sampling strategy; Supervised learning;
D O I
10.1007/s13748-021-00230-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the important subtasks of information extraction is named entity recognition (NER). Its aim is to identify and to classify the named entities in the textual data into predetermined categories. There are a large number of supervised learning and deep learning models being developed for the entity recognition task, which performs well in the presence of a labeled training set. The availability of the labeled training set requires the labeling of large unlabeled data, which is both expensive and time taking. Active learning is an iterative approach that provides a way to minimize labeling cost without affecting performance. This approach uses a sampling strategy that selects the appropriate unlabeled data instances, an oracle to label the selected data instances, and a machine learning model (base classifier). In this work, a modified least confidence-based query sampling strategy for the active learning approach for named entity recognition task has been proposed, which considers different numbers of uncertain words present within the sentences to compute the final least confidence score of the sentence for comparison. To evaluate the effectiveness of the proposed approach, the comparison of the performance is made among the active learning approaches with the proposed sampling strategy, random sampling strategy, and two other well-known existing uncertainty query sampling strategies. Real-world scenario for active learning approach is simulated for experiment, and the total amount of labeled data required for training of active learner to reach the stop condition while using different sampling strategies is recorded. The experiment is carried for the development and the test set of the three different biomedical corpora and a Spanish language NER corpus. It is found that with the proposed active learning approach, there is a minimal requirement of labeled data for training to reach the above performance level in comparison with the other approaches. The performance of the proposed approach is found to be slightly better than the existing sampling approach, and the performance of all the approaches is far better than the random sampling approach.
引用
收藏
页码:113 / 128
页数:16
相关论文
共 50 条
  • [41] A New Approach for Named Entity Recognition
    Ertopcu, Burak
    Kanburoglu, Ali Bugra
    Topsakal, Ozan
    Acikgoz, Onur
    Gurkan, Ali Tunca
    Ozenc, Berke
    Cam, Ilker
    Avar, Begum
    Ercan, Gokhan
    Yildiz, Olcay Taner
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 474 - 479
  • [42] The ConceptMapper Approach to Named Entity Recognition
    Tanenblatt, Michael
    Coden, Anni
    Sominsky, Igor
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [43] A Named Entity Recognition Approach for Albanian
    Skenduli, Marjana Prifti
    Biba, Marenglen
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 1532 - 1537
  • [44] Medical Named Entity Recognition Using Weakly Supervised Learning
    Long-Long Ma
    Jie Yang
    Bo An
    Shuaikang Liu
    Gaijuan Huang
    Cognitive Computation, 2022, 14 : 1068 - 1079
  • [45] Resolving Ambiguities in Named Entity Recognition Using Machine Learning
    Bhandari, Nitin
    Chowdri, Ritika
    Singh, Harmeet
    Qureshi, Salim Raza
    2017 INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING AND INFORMATION SYSTEMS (ICNGCIS), 2017, : 159 - 163
  • [46] Medical Named Entity Recognition Using Weakly Supervised Learning
    Ma, Long-Long
    Yang, Jie
    An, Bo
    Liu, Shuaikang
    Huang, Gaijuan
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1068 - 1079
  • [47] EASAL: Entity-Aware Subsequence-Based Active Learning for Named Entity Recognition
    Liu, Yang
    Hu, Jinpeng
    Chen, Zhihong
    Wan, Xiang
    Chang, Tsung-Hui
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8897 - 8905
  • [48] Amazighe Named Entity Recognition Using a A Rule Based Approach
    Boulaknadel, Siham
    Talha, Meryem
    Aboutajdine, Driss
    2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 478 - 484
  • [49] A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields
    Van Cuong Tran
    Ngoc Thanh Nguyen
    Fujita, Hamido
    Dinh Tuyen Hoang
    Hwang, Dosam
    KNOWLEDGE-BASED SYSTEMS, 2017, 132 : 179 - 187
  • [50] A CRF based Machine Learning Approach for Biomedical Named Entity Recognition
    Kanimozhi, U.
    Manjula, D.
    2017 SECOND INTERNATIONAL CONFERENCE ON RECENT TRENDS AND CHALLENGES IN COMPUTATIONAL MODELS (ICRTCCM), 2017, : 335 - 342