CycleNER: An Unsupervised Training Approach for Named Entity Recognition

被引:19
|
作者
Iovine, Andrea [1 ]
Fang, Anjie [2 ]
Fetahu, Besnik [2 ]
Rokhlenko, Oleg [2 ]
Malmasi, Shervin [2 ]
机构
[1] Univ Bari Aldo Moro, Bari, Italy
[2] Amazoncom Inc, Bellevue, WA USA
关键词
natural language processing; named entity recognition; cycleconsistency; training; unsupervised training;
D O I
10.1145/3485447.3512012
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Named Entity Recognition (NER) is a crucial natural language understanding task for many down-stream tasks such as question answering and retrieval. Despite significant progress in developing NER models for multiple languages and domains, scaling to emerging and/or low-resource domains still remains challenging, due to the costly nature of acquiring training data. We propose CycleNER, an unsupervised approach based on cycle-consistency training that uses two functions: (i) sentence-to-entity - S2E and (ii) entity-to-sentence - E2S, to carry out the NER task. CycleNER does not require annotations but a set of sentences with no entity labels and another independent set of entity examples. Through cycle-consistency training, the output from one function is used as input for the other (e.g. S2E. E2S) to align the representation spaces of both functions and therefore enable unsupervised training. Evaluation on several domains comparing CycleNER against supervised and unsupervised competitors shows that CycleNER achieves highly competitive performance with only a few thousand input sentences. We demonstrate competitive performance against supervised models, achieving 73% of supervised performance without any annotations on CoNLL03, while significantly outperforming unsupervised approaches.
引用
收藏
页码:2916 / 2924
页数:9
相关论文
共 50 条
  • [1] Unsupervised Paraphrasing Consistency Training for Low Resource Named Entity Recognition
    Wang, Rui
    Henao, Ricardo
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5303 - 5308
  • [2] Unsupervised cross-domain named entity recognition using entity-aware adversarial training
    Peng, Qi
    Zheng, Changmeng
    Cai, Yi
    Wang, Tao
    Xie, Haoran
    Li, Qing
    NEURAL NETWORKS, 2021, 138 (138) : 68 - 77
  • [3] Unsupervised Ranking of Knowledge Bases for Named Entity Recognition
    Mrabet, Yassine
    Kilicoglu, Halil
    Demner-Fushman, Dina
    ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1248 - 1255
  • [4] A New Approach for Named Entity Recognition
    Ertopcu, Burak
    Kanburoglu, Ali Bugra
    Topsakal, Ozan
    Acikgoz, Onur
    Gurkan, Ali Tunca
    Ozenc, Berke
    Cam, Ilker
    Avar, Begum
    Ercan, Gokhan
    Yildiz, Olcay Taner
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 474 - 479
  • [5] The ConceptMapper Approach to Named Entity Recognition
    Tanenblatt, Michael
    Coden, Anni
    Sominsky, Igor
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [6] A Named Entity Recognition Approach for Albanian
    Skenduli, Marjana Prifti
    Biba, Marenglen
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 1532 - 1537
  • [7] A pre-training and self-training approach for biomedical named entity recognition
    Gao, Shang
    Kotevska, Olivera
    Sorokine, Alexandre
    Christian, J. Blair
    PLOS ONE, 2021, 16 (02):
  • [8] A Self-training Approach for Few-Shot Named Entity Recognition
    Qian, Yudong
    Zheng, Weiguo
    WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 183 - 191
  • [9] A Genetic Approach for Biomedical Named Entity Recognition
    Ekbal, Asif
    Saha, Sriparna
    Sikdar, Utpal Kumar
    Hasanuzzaman, Md
    22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 2, 2010, : 354 - +
  • [10] A New Approach for Arabic Named Entity Recognition
    Karaa, Wahiba
    Slimani, Thabet
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (03) : 332 - 338