Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

被引:0
|
作者
Jarrar, Mustafa [1 ]
Khalilia, Mohammed [1 ]
Ghanem, Sana [1 ]
机构
[1] Birzeit Univ, Birzeit, Palestine
关键词
Named Entity Recognition; Multi-Task Learning; Nested Entities; BERT; Arabic NER Corpus;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER). Nested entities occur when one entity mention is embedded inside another entity mention. Wojood consists of about 550K Modern Standard Arabic (MSA) and dialect tokens that are manually annotated with 21 entity types including person, organization, location, event and date. More importantly, the corpus is annotated with nested entities instead of the more common flat annotations. The data contains about 75K entities and 22.5% of which are nested. The inter-annotator evaluation of the corpus demonstrated a strong agreement with Cohen's Kappa of 0.979 and an F1-score of 0.976. To validate our data, we used the corpus to train a nested NER model based on multi-task learning using the pre-trained AraBERT (Arabic BERT). The model achieved an overall micro F1-score of 0.884. Our corpus, the annotation guidelines, the source code and the pre-trained model are publicly available.
引用
收藏
页码:3626 / 3636
页数:11
相关论文
共 50 条
  • [31] A Finnish news corpus for named entity recognition
    Ruokolainen, Teemu
    Kauppinen, Pekka
    Silfverberg, Miikka
    Linden, Krister
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (01) : 247 - 272
  • [32] Nested Named Entity Recognition Using Multilayer Recurrent Neural Networks
    Truong-Son Nguyen
    Le-Minh Nguyen
    COMPUTATIONAL LINGUISTICS, PACLING 2017, 2018, 781 : 233 - 246
  • [33] Text Summarization based Named Entity Recognition for Certain Application using BERT
    Tummala, Indira Priyadarshini
    2024 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT CYBER PHYSICAL SYSTEMS AND INTERNET OF THINGS, ICOICI 2024, 2024, : 1136 - 1141
  • [34] Biomedical named entity recognition using BERT in the machine reading comprehension framework
    Sun, Cong
    Yang, Zhihao
    Wang, Lei
    Zhang, Yin
    Lin, Hongfei
    Wang, Jian
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 118
  • [35] Named Entity Recognition Using BERT with Whole World Masking in Cybersecurity Domain
    Zhou, Shicheng
    Liu, Jingju
    Zhong, Xiaofeng
    Zhao, Wendian
    2021 IEEE 6TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2021), 2021, : 316 - 320
  • [36] A Chinese nested named entity recognition approach using sequence labeling
    Chen, Maojian
    Luo, Xiong
    Shen, Hailun
    Huang, Ziyang
    Peng, Qiaojuan
    Yuan, Yuqi
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2023, 19 (01) : 42 - 60
  • [37] A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies
    Ngoc-Trinh Vu
    Van-Hien Tran
    Thi-Huyen-Trang Doan
    Hoang-Quynh Le
    Mai-Vu Tran
    ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING, 2015, 358 : 141 - 149
  • [38] Using corpus-derived name lists for named entity recognition
    Stevenson, M
    Gaizauskas, R
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 290 - 295
  • [39] A Bidirectional Iterative Algorithm for Nested Named Entity Recognition
    Dadas, Slawomir
    Protasiewicz, Jaroslaw
    IEEE ACCESS, 2020, 8 (08): : 135091 - 135102
  • [40] Few-shot nested named entity recognition
    Ming, Hong
    Yang, Jiaoyun
    Gui, Fang
    Jiang, Lili
    An, Ning
    KNOWLEDGE-BASED SYSTEMS, 2024, 293