NAFIS: A Gold Standard Corpus for Arabic Stemmers Evaluation

被引：0

作者：

Namly, Driss ^{[1
]}

Tajmout, Rachida ^{[1
]}

Bouzoubaa, Karim ^{[1
]}

Abouenour, Lahsen ^{[1
]}

机构：

[1] Mohammed V Univ, Mohammadia Sch Engineers, Rabat, Morocco

来源：

VISION 2020: INNOVATION MANAGEMENT, DEVELOPMENT SUSTAINABILITY, AND COMPETITIVE ECONOMIC GROWTH, 2016, VOLS I - VII | 2016年

关键词：

component; Arabic language; Arabic stemming; Stemmers evaluation; Evaluation corpus; Gold Standard Corpus;

D O I：

暂无

中图分类号：

F [经济];

学科分类号：

02 ;

摘要：

Arabic stemming as an important pre-processing task in Arabic natural language processing services and applications experience two serious deficiencies: "unique stemming solution" and "stemmers' performance inconsistency". These defects are mainly caused by the absence of a Gold Standard Corpus. Defined as a collection of texts stored in an electronic format, selected to be representative of a particular language, collection or genre, manually annotated and enriched with additional linguistic information, such corpus is used in stemmers benchmarking works. This paper provides a sight on NAFIS (Normalized Arabic Fragments for Inestimable Stemming), an Arabic stemming gold standard corpus. We describe NAFIS building methodology and we use it as an evaluation corpus in a benchmarking exercise.

引用

页码：1868 / 1877

页数：10

共 50 条

[1] Benchmarking and assessing the performance of Arabic stemmers
Al-Kabi, Mohammed N.
Al-Radaideh, Qasem A.
Akkawi, Khalid W.
JOURNAL OF INFORMATION SCIENCE, 2011, 37 (02) : 111 - 119
[2] A Gold Multipurpose Arabic Corpus (GAC)
Awdeh, Hussein
Abdallah, Adelle
Zaki, Youssef
Bernard, Gilles
Hajjar, Mohammad
INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 643 - 647
[3] A Gold Standard-Based Approach for Arabic Ontology Evaluation
Mezghanni, Imen Bouaziz
Gargouri, Faiez
PROCEEDINGS OF THE 18TH EUROPEAN CONFERENCE ON KNOWLEDGE MANAGEMENT (ECKM 2017), VOLS 1 AND 2, 2017, : 1153 - 1161
[4] A Leveled Reading Corpus of Modern Standard Arabic
Al Khalil, Muhamed
Saddiki, Hind
Habash, Nizar
Alfalasi, Latifa
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2317 - 2321
[5] A Prototype for a Standard Arabic Sentiment Analysis Corpus
Al-Kabi, Mohammed
Al-Ayyoub, Mahmoud
Alsmadi, Izzat
Wahsheh, Heider
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2016, 13 (1A) : 163 - 170
[6] A Silver Standard Biomedical Corpus for Arabic Language
Boudjellal, Nada
Zhang, Huaping
Khan, Asif
Ahmad, Arshad
Naseem, Rashid
Dai, Lin
COMPLEXITY, 2020, 2020 (2020)
[7] A Gold Standard Dependency Corpus for English
Silveira, Natalia
Dozat, Timothy
de Marneffe, Marie-Catherine
Bowman, Samuel R.
Connor, Miriam
Bauer, John
Manning, Christopher D.
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2897 - 2904
[8] On the Use of Arabic Stemmers to Increase the Recall of Information Retrieval Systems
Nasra, Ihab
Maree, Mohammed
2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 2462 - 2468
[9] Testing a Large Corpus of Natural Standard Arabic for Rhythm Class
Dockendorf, Liz
Almubayei, Dalal
Benton, Matthew
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 771 - 771
[10] Comparative Analysis of Nine Arabic Stemmers on Microblog Information Retrieval
Almazrua, Amal
Almazrua, Manal
Alkhalifa, Hend
2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 60 - 65

← 1 2 3 4 5 →