NAFIS: A Gold Standard Corpus for Arabic Stemmers Evaluation

被引:0
|
作者
Namly, Driss [1 ]
Tajmout, Rachida [1 ]
Bouzoubaa, Karim [1 ]
Abouenour, Lahsen [1 ]
机构
[1] Mohammed V Univ, Mohammadia Sch Engineers, Rabat, Morocco
关键词
component; Arabic language; Arabic stemming; Stemmers evaluation; Evaluation corpus; Gold Standard Corpus;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
Arabic stemming as an important pre-processing task in Arabic natural language processing services and applications experience two serious deficiencies: "unique stemming solution" and "stemmers' performance inconsistency". These defects are mainly caused by the absence of a Gold Standard Corpus. Defined as a collection of texts stored in an electronic format, selected to be representative of a particular language, collection or genre, manually annotated and enriched with additional linguistic information, such corpus is used in stemmers benchmarking works. This paper provides a sight on NAFIS (Normalized Arabic Fragments for Inestimable Stemming), an Arabic stemming gold standard corpus. We describe NAFIS building methodology and we use it as an evaluation corpus in a benchmarking exercise.
引用
收藏
页码:1868 / 1877
页数:10
相关论文
共 50 条
  • [1] Benchmarking and assessing the performance of Arabic stemmers
    Al-Kabi, Mohammed N.
    Al-Radaideh, Qasem A.
    Akkawi, Khalid W.
    JOURNAL OF INFORMATION SCIENCE, 2011, 37 (02) : 111 - 119
  • [2] A Gold Multipurpose Arabic Corpus (GAC)
    Awdeh, Hussein
    Abdallah, Adelle
    Zaki, Youssef
    Bernard, Gilles
    Hajjar, Mohammad
    INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 643 - 647
  • [3] A Gold Standard-Based Approach for Arabic Ontology Evaluation
    Mezghanni, Imen Bouaziz
    Gargouri, Faiez
    PROCEEDINGS OF THE 18TH EUROPEAN CONFERENCE ON KNOWLEDGE MANAGEMENT (ECKM 2017), VOLS 1 AND 2, 2017, : 1153 - 1161
  • [4] A Leveled Reading Corpus of Modern Standard Arabic
    Al Khalil, Muhamed
    Saddiki, Hind
    Habash, Nizar
    Alfalasi, Latifa
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2317 - 2321
  • [5] A Prototype for a Standard Arabic Sentiment Analysis Corpus
    Al-Kabi, Mohammed
    Al-Ayyoub, Mahmoud
    Alsmadi, Izzat
    Wahsheh, Heider
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2016, 13 (1A) : 163 - 170
  • [6] A Silver Standard Biomedical Corpus for Arabic Language
    Boudjellal, Nada
    Zhang, Huaping
    Khan, Asif
    Ahmad, Arshad
    Naseem, Rashid
    Dai, Lin
    COMPLEXITY, 2020, 2020 (2020)
  • [7] A Gold Standard Dependency Corpus for English
    Silveira, Natalia
    Dozat, Timothy
    de Marneffe, Marie-Catherine
    Bowman, Samuel R.
    Connor, Miriam
    Bauer, John
    Manning, Christopher D.
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2897 - 2904
  • [8] On the Use of Arabic Stemmers to Increase the Recall of Information Retrieval Systems
    Nasra, Ihab
    Maree, Mohammed
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 2462 - 2468
  • [9] Testing a Large Corpus of Natural Standard Arabic for Rhythm Class
    Dockendorf, Liz
    Almubayei, Dalal
    Benton, Matthew
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 771 - 771
  • [10] Comparative Analysis of Nine Arabic Stemmers on Microblog Information Retrieval
    Almazrua, Amal
    Almazrua, Manal
    Alkhalifa, Hend
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 60 - 65