Active learning approach using a modified least confidence sampling strategy for named entity recognition

被引:19
|
作者
Agrawal, Ankit [1 ]
Tripathi, Sarsij [2 ]
Vardhan, Manu [1 ]
机构
[1] Natl Inst Technol Raipur, Dept Comp Sci & Engn, Raipur, Chhattisgarh, India
[2] Motilal Nehru Natl Inst Technol Allahabad, Dept Comp Sci & Engn, Prayagraj, Uttar Pradesh, India
关键词
Named entity recognition; Active learning; Least confidence; Sampling strategy; Supervised learning;
D O I
10.1007/s13748-021-00230-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the important subtasks of information extraction is named entity recognition (NER). Its aim is to identify and to classify the named entities in the textual data into predetermined categories. There are a large number of supervised learning and deep learning models being developed for the entity recognition task, which performs well in the presence of a labeled training set. The availability of the labeled training set requires the labeling of large unlabeled data, which is both expensive and time taking. Active learning is an iterative approach that provides a way to minimize labeling cost without affecting performance. This approach uses a sampling strategy that selects the appropriate unlabeled data instances, an oracle to label the selected data instances, and a machine learning model (base classifier). In this work, a modified least confidence-based query sampling strategy for the active learning approach for named entity recognition task has been proposed, which considers different numbers of uncertain words present within the sentences to compute the final least confidence score of the sentence for comparison. To evaluate the effectiveness of the proposed approach, the comparison of the performance is made among the active learning approaches with the proposed sampling strategy, random sampling strategy, and two other well-known existing uncertainty query sampling strategies. Real-world scenario for active learning approach is simulated for experiment, and the total amount of labeled data required for training of active learner to reach the stop condition while using different sampling strategies is recorded. The experiment is carried for the development and the test set of the three different biomedical corpora and a Spanish language NER corpus. It is found that with the proposed active learning approach, there is a minimal requirement of labeled data for training to reach the above performance level in comparison with the other approaches. The performance of the proposed approach is found to be slightly better than the existing sampling approach, and the performance of all the approaches is far better than the random sampling approach.
引用
收藏
页码:113 / 128
页数:16
相关论文
共 50 条
  • [31] A deep learning approach for Named Entity Recognition in Urdu language
    Anam, Rimsha
    Waqas Anwar, Muhammad
    Hasan Jamal, Muhammad
    Ijaz Bajwa, Usama
    de la Torre Diez, Isabel
    Silva Alvarado, Eduardo
    Soriano Flores, Emmanuel
    Ashraf, Imran
    PLOS ONE, 2024, 19 (03):
  • [32] ALDANER: Active Learning based Data Augmentation for Named Entity Recognition
    Moscato, Vincenzo
    Postiglione, Marco
    Sperli, Giancarlo
    Vignali, Andrea
    KNOWLEDGE-BASED SYSTEMS, 2024, 305
  • [33] CRF-based Active Learning for Chinese Named Entity Recognition
    Yao, Lin
    Sun, Chengjie
    Li, Shaofeng
    Wang, Xiaolong
    Wang, Xuan
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 1557 - +
  • [34] A study of active learning methods for named entity recognition in clinical text
    Chen, Yukun
    Lasko, Thomas A.
    Mei, Qiaozhu
    Denny, Joshua C.
    Xu, Hua
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 : 11 - 18
  • [35] Named entity recognition in greek texts with an ensemble of SVMS and active learning
    Lucarelli, Giorgio
    Vasilakos, Xenofon
    Androutsopoulos, Ion
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2007, 16 (06) : 1015 - 1045
  • [36] Named Entity Recognition in Mammography Radiology Reports using a Multilingual Transfer Learning Approach
    Salazar Cabrera, Esteban Ricardo
    Santos Diaz, Alejandro
    Menasalvas, Ernesitina
    Tamez Pena, Jose Gerardo
    Robles, Victor
    2024 IEEE 37TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS 2024, 2024, : 273 - 277
  • [37] Continual Learning for Named Entity Recognition
    Monaikul, Natawut
    Castellucci, Giuseppe
    Filice, Simone
    Rokhlenko, Oleg
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13570 - 13577
  • [38] Ensemble Learning for Named Entity Recognition
    Speck, Rene
    Ngomo, Axel-Cyrille Ngonga
    SEMANTIC WEB - ISWC 2014, PT I, 2014, 8796 : 519 - 534
  • [39] CLGLF: Confidence Learning Guides Label Fusion for Multimodal Named Entity Recognition Method
    Wang, Hai-Rong
    Wang, Tong
    Xu, Xi
    Jing, Bo-Xiang
    Chen, Fang-Ping
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (07): : 2429 - 2437
  • [40] Joint Learning of Named Entity Recognition and Entity Linking
    Martins, Pedro Henrique
    Marinho, Zita
    Martins, Andre F. T.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 190 - 196