Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT

被引:0
|
作者
Jarrar, Mustafa [1 ]
Khalilia, Mohammed [1 ]
Ghanem, Sana [1 ]
机构
[1] Birzeit Univ, Birzeit, Palestine
关键词
Named Entity Recognition; Multi-Task Learning; Nested Entities; BERT; Arabic NER Corpus;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents Wojood, a corpus for Arabic nested Named Entity Recognition (NER). Nested entities occur when one entity mention is embedded inside another entity mention. Wojood consists of about 550K Modern Standard Arabic (MSA) and dialect tokens that are manually annotated with 21 entity types including person, organization, location, event and date. More importantly, the corpus is annotated with nested entities instead of the more common flat annotations. The data contains about 75K entities and 22.5% of which are nested. The inter-annotator evaluation of the corpus demonstrated a strong agreement with Cohen's Kappa of 0.979 and an F1-score of 0.976. To validate our data, we used the corpus to train a nested NER model based on multi-task learning using the pre-trained AraBERT (Arabic BERT). The model achieved an overall micro F1-score of 0.884. Our corpus, the annotation guidelines, the source code and the pre-trained model are publicly available.
引用
收藏
页码:3626 / 3636
页数:11
相关论文
共 50 条
  • [41] Chinese mineral named entity recognition based on BERT model
    Yu, Yuqing
    Wang, Yuzhu
    Mua, Jingqin
    Li, Wei
    Jiao, Shoutao
    Wang, Zhenhua
    Lv, Pengfei
    Zhu, Yueqin
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 206
  • [42] Hierarchical Region Learning for Nested Named Entity Recognition
    Long, Xinwei
    Niu, Shuzi
    Li, Yucheng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4788 - 4793
  • [43] Nested named entity recognition in historical archive text
    Byrne, Kate
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 589 - 596
  • [44] Deep Exhaustive Model for Nested Named Entity Recognition
    Sohrab, Mohammad Golam
    Miwa, Makoto
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2843 - 2849
  • [45] A Boundary Regression Model for Nested Named Entity Recognition
    Yanping Chen
    Lefei Wu
    Qinghua Zheng
    Ruizhang Huang
    Jun Liu
    Liyuan Deng
    Junhui Yu
    Yongbin Qing
    Bo Dong
    Ping Chen
    Cognitive Computation, 2023, 15 : 534 - 551
  • [46] Chinese Named Entity Recognition in the Geoscience Domain Based on BERT
    Lv, Xia
    Xie, Zhong
    Xu, Dexin
    Jin, Xiangguo
    Ma, Kai
    Tao, Liufeng
    Qiu, Qinjun
    Pan, Yongsheng
    EARTH AND SPACE SCIENCE, 2022, 9 (03)
  • [47] Named Entity Recognition in Aviation Products Domain Based on BERT
    Yang, Mingye
    Namoano, Bernadin
    Farsi, Maryam
    Erkoyuncu, John Ahmet
    IEEE ACCESS, 2024, 12 : 189710 - 189721
  • [48] Nested Named Entity Recognition as Building Local Hypergraphs
    Yan, Yukun
    Cai, Bingling
    Song, Sen
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13878 - 13886
  • [49] Candidate region aware nested named entity recognition
    Jiang, Deng
    Ren, Haopeng
    Cai, Yi
    Xu, Jingyun
    Liu, Yanxia
    Leung, Ho-fung
    NEURAL NETWORKS, 2021, 142 : 340 - 350
  • [50] Arabic Named Entity Recognition-A Survey and Analysis
    Dandashi, Amal
    Al Jaam, Jihad
    Foufou, Sebti
    INTELLIGENT INTERACTIVE MULTIMEDIA SYSTEMS AND SERVICES 2016, 2016, 55 : 83 - 96