Comparative study of text representation and learning for Persian named entity recognition

被引:2
|
作者
Pour, Mohammad Mahdi Abdollah [1 ]
Momtazi, Saeedeh [1 ]
机构
[1] Amirkabir Univ Technol, Comp Engn Dept, Tehran, Iran
关键词
contextualized representation; NER; Persian language processing;
D O I
10.4218/etrij.2021-0269
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Transformer models have had a great impact on natural language processing (NLP) in recent years by realizing outstanding and efficient contextualized language models. Recent studies have used transformer-based language models for various NLP tasks, including Persian named entity recognition (NER). However, in complex tasks, for example, NER, it is difficult to determine which contextualized embedding will produce the best representation for the tasks. Considering the lack of comparative studies to investigate the use of different contextualized pretrained models with sequence modeling classifiers, we conducted a comparative study about using different classifiers and embedding models. In this paper, we use different transformer-based language models tuned with different classifiers, and we evaluate these models on the Persian NER task. We perform a comparative analysis to assess the impact of text representation and text classification methods on Persian NER performance. We train and evaluate the models on three different Persian NER datasets, that is, MoNa, Peyma, and Arman. Experimental results demonstrate that XLM-R with a linear layer and conditional random field (CRF) layer exhibited the best performance. This model achieved phrase-based F-measures of 70.04, 86.37, and 79.25 and word-based F scores of 78, 84.02, and 89.73 on the MoNa, Peyma, and Arman datasets, respectively. These results represent state-of-the-art performance on the Persian NER task.
引用
收藏
页码:794 / 804
页数:11
相关论文
共 50 条
  • [1] Persian Automatic Text Summarization Based on Named Entity Recognition
    Khademi, Mohammad Ebrahim
    Fakhredanesh, Mohammad
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2020,
  • [2] A Comparative Study of Segment Representation for Biomedical Named Entity Recognition
    Shashirekha, H. L.
    Nayel, Hamada A.
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 1046 - 1052
  • [3] Persian Named Entity Recognition
    Dashtipour, Kia
    Gogate, Mandar
    Adeel, Ahsan
    Algarafi, Abdulrahman
    Howard, Newton
    Hussain, Amir
    2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 79 - 83
  • [4] A study of active learning methods for named entity recognition in clinical text
    Chen, Yukun
    Lasko, Thomas A.
    Mei, Qiaozhu
    Denny, Joshua C.
    Xu, Hua
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 : 11 - 18
  • [5] Learning Morpheme Representation for Mongolian Named Entity Recognition
    Weihua Wang
    Feilong Bao
    Guanglai Gao
    Neural Processing Letters, 2019, 50 : 2647 - 2664
  • [6] Learning Morpheme Representation for Mongolian Named Entity Recognition
    Wang, Weihua
    Bao, Feilong
    Gao, Guanglai
    NEURAL PROCESSING LETTERS, 2019, 50 (03) : 2647 - 2664
  • [7] A Comparative Study of Named Entity Recognition for Telugu
    Gorla, SaiKiranmai
    Murthy, N. L. Bhanu
    Malapati, Aruna
    PROCEEDINGS OF THE 9TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2017), 2017, : 21 - 24
  • [8] A Hybrid Method for Persian Named Entity Recognition
    Ahmadi, Farid
    Moradi, Hamed
    2015 7th Conference on Information and Knowledge Technology (IKT), 2015,
  • [9] Transfer learning for Turkish named entity recognition on noisy text
    Kagan Akkaya, Emre
    Can, Burcu
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (01) : 35 - 64
  • [10] A Hybrid Approach for Persian Named Entity Recognition
    Hamed Moradi
    Farid Ahmadi
    Mohammad-Reza Feizi-Derakhshi
    Iranian Journal of Science and Technology, Transactions A: Science, 2017, 41 : 215 - 222