Active learning approach using a modified least confidence sampling strategy for named entity recognition

被引:19
|
作者
Agrawal, Ankit [1 ]
Tripathi, Sarsij [2 ]
Vardhan, Manu [1 ]
机构
[1] Natl Inst Technol Raipur, Dept Comp Sci & Engn, Raipur, Chhattisgarh, India
[2] Motilal Nehru Natl Inst Technol Allahabad, Dept Comp Sci & Engn, Prayagraj, Uttar Pradesh, India
关键词
Named entity recognition; Active learning; Least confidence; Sampling strategy; Supervised learning;
D O I
10.1007/s13748-021-00230-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the important subtasks of information extraction is named entity recognition (NER). Its aim is to identify and to classify the named entities in the textual data into predetermined categories. There are a large number of supervised learning and deep learning models being developed for the entity recognition task, which performs well in the presence of a labeled training set. The availability of the labeled training set requires the labeling of large unlabeled data, which is both expensive and time taking. Active learning is an iterative approach that provides a way to minimize labeling cost without affecting performance. This approach uses a sampling strategy that selects the appropriate unlabeled data instances, an oracle to label the selected data instances, and a machine learning model (base classifier). In this work, a modified least confidence-based query sampling strategy for the active learning approach for named entity recognition task has been proposed, which considers different numbers of uncertain words present within the sentences to compute the final least confidence score of the sentence for comparison. To evaluate the effectiveness of the proposed approach, the comparison of the performance is made among the active learning approaches with the proposed sampling strategy, random sampling strategy, and two other well-known existing uncertainty query sampling strategies. Real-world scenario for active learning approach is simulated for experiment, and the total amount of labeled data required for training of active learner to reach the stop condition while using different sampling strategies is recorded. The experiment is carried for the development and the test set of the three different biomedical corpora and a Spanish language NER corpus. It is found that with the proposed active learning approach, there is a minimal requirement of labeled data for training to reach the above performance level in comparison with the other approaches. The performance of the proposed approach is found to be slightly better than the existing sampling approach, and the performance of all the approaches is far better than the random sampling approach.
引用
收藏
页码:113 / 128
页数:16
相关论文
共 50 条
  • [21] Re-weighting Tokens: A Simple and Effective Active Learning Strategy for Named Entity Recognition
    Luo, Haocheng
    Tan, Wei
    Ngoc Dang Nguyen
    Du, Lan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 12725 - 12734
  • [22] A multi-strategy approach to biological named entity recognition
    Atkinson, John
    Bull, Veronica
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (17) : 12968 - 12974
  • [23] Loss-based Active Learning for Named Entity Recognition
    Linh, Le Thai
    Nguyen, Minh-Tien
    Zuccon, Guido
    Demartini, Gianluca
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [24] Subsequence Based Deep Active Learning for Named Entity Recognition
    Radmard, Puria
    Fathullah, Yassir
    Lipani, Aldo
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4310 - 4321
  • [25] Clustering Based Active Learning for Biomedical Named Entity Recognition
    Han, Xu
    Kwoh, Chee Keong
    Kim, Jung-jae
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1253 - 1260
  • [26] Arabic Location Named Entity Recognition for Tweets using a Deep Learning Approach
    Alzaidi, Bedour Swayelh
    Abushark, Yoosef
    Khan, Asif Irshad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 76 - 83
  • [27] Named Entity Recognition for Amharic Using Deep Learning
    Gamback, Bjorn
    Sikdar, Utpal Kumar
    2017 IST-AFRICA WEEK CONFERENCE (IST-AFRICA), 2017,
  • [28] On active annotation for named entity recognition
    Ekbal, Asif
    Saha, Sriparna
    Sikdar, Utpal Kumar
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2016, 7 (04) : 623 - 640
  • [29] On active annotation for named entity recognition
    Asif Ekbal
    Sriparna Saha
    Utpal Kumar Sikdar
    International Journal of Machine Learning and Cybernetics, 2016, 7 : 623 - 640
  • [30] Named entity recognition: a semi-supervised learning approach
    Sintayehu H.
    Lehal G.S.
    International Journal of Information Technology, 2021, 13 (4) : 1659 - 1665