NAFIS: A Gold Standard Corpus for Arabic Stemmers Evaluation

被引:0
|
作者
Namly, Driss [1 ]
Tajmout, Rachida [1 ]
Bouzoubaa, Karim [1 ]
Abouenour, Lahsen [1 ]
机构
[1] Mohammed V Univ, Mohammadia Sch Engineers, Rabat, Morocco
关键词
component; Arabic language; Arabic stemming; Stemmers evaluation; Evaluation corpus; Gold Standard Corpus;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
Arabic stemming as an important pre-processing task in Arabic natural language processing services and applications experience two serious deficiencies: "unique stemming solution" and "stemmers' performance inconsistency". These defects are mainly caused by the absence of a Gold Standard Corpus. Defined as a collection of texts stored in an electronic format, selected to be representative of a particular language, collection or genre, manually annotated and enriched with additional linguistic information, such corpus is used in stemmers benchmarking works. This paper provides a sight on NAFIS (Normalized Arabic Fragments for Inestimable Stemming), an Arabic stemming gold standard corpus. We describe NAFIS building methodology and we use it as an evaluation corpus in a benchmarking exercise.
引用
收藏
页码:1868 / 1877
页数:10
相关论文
共 50 条
  • [41] A Monolingual Parallel Corpus of Arabic
    Al-Raisi, Fatima
    Lin, Weijian
    Bourai, Abdelwahab
    ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 334 - 338
  • [42] A Multidialectal Parallel Corpus of Arabic
    Bouamor, Houda
    Habash, Nizar
    Oflazer, Kemal
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1240 - 1245
  • [43] The Constitution of an Arabic Touristic Corpus
    Lhioui, Chahira
    Zouaghi, Anis
    Zrigui, Mounir
    ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 14 - 25
  • [44] OCA: Opinion Corpus for Arabic
    Rushdi-Saleh, Mohammed
    Teresa Martin-Valdivia, M.
    Alfonso Urena-Lopez, L.
    Perea-Ortega, Jose M.
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (10): : 2045 - 2054
  • [45] Arabic corpus linguistics.
    Holes, Clive
    LANGUAGE, 2020, 96 (01) : 202 - 206
  • [46] AraFast: Developing and Evaluating a Comprehensive Modern Standard Arabic Corpus for Enhanced Natural Language Processing
    Alrayzah, Asmaa
    Alsolami, Fawaz
    Saleh, Mostafa
    APPLIED SCIENCES-BASEL, 2024, 14 (12):
  • [47] Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems
    Abushariah, Mohammad Abd-Alrahman Mahmoud
    Ainon, Raja Noor
    Zainuddin, Roziati
    Alqudah, Assal Ali Mustafa
    Ahmed, Moustafa Elshafei
    Khalifa, Othman Omran
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2012, 349 (07): : 2215 - 2242
  • [48] The influence of English on Modern Standard Arabic speech reporting styles: A corpus-based study
    Al-Wahy, Ahmed Seddik
    LINGUA, 2021, 259
  • [49] Towards Building a Standard Dataset for Arabic Keyphrase Extraction Evaluation
    Helmy, Muhammad
    Basaldella, Marco
    Maddalena, Eddy
    Mizzaro, Stefano
    Demartinit, Gianluca
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 26 - 29
  • [50] Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications
    Zielinski, Andrea
    Mutschke, Peter
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 529 - 535