ANT Corpus : An Arabic News Text Collection for Textual Classification

被引:23
|
作者
Chouigui, Amina [2 ]
Ben Khiroun, Oussama [1 ,2 ]
Elayeb, Bilel [1 ,3 ]
机构
[1] Manouba Univ, RIADI Res Lab, ENSI, Manouba 2010, Tunisia
[2] Sousse Univ, Natl Engn Sch Sousse, ENISO, Sousse 4002, Tunisia
[3] Emirates Coll Technol, POB 41009, Abu Dhabi, U Arab Emirates
关键词
Arabic language; standard Arabic corpus; text classification; RSS crawling; TREC format; SVM; NB; AGREEMENT; KAPPA;
D O I
10.1109/AICCSA.2017.22
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose in this paper a new online Arabic corpus of news articles, named ANT Corpus, which is collected from RSS Feeds. Each document represents an article structured in the standard XML TREC format. We use the ANT Corpus for Text Classification (TC) by applying the SVM and Naive Bayes (NB) classifiers to assign to each article its accurate predefined category. We study also in this work the contribution of terms weighting, stop-words removal and light stemming on Arabic TC. The experimental results prove that the text length affects considerably the TC accuracy and that titles words are not sufficiently significant to perform good classification rates. As a conclusion, the SVM method gives the best results of classification of both titles and texts parts.
引用
收藏
页码:135 / 142
页数:8
相关论文
共 50 条
  • [31] LANS: Large-scale Arabic News Summarization Corpus
    Alhamadani, Abdulaziz
    Zhang, Xuchao
    He, Jianfeng
    Khatri, Aadyant
    Lu, Chang-Tien
    ArabicNLP 2023 - 1st Arabic Natural Language Processing Conference, Proceedings, 2023, : 89 - 100
  • [32] Analogical Text Mining: Application to Arabic Text Summarization and Classification
    Elayeb, Bilel
    Chouigui, Amina
    Bounhas, Myriam
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [33] A Comparison of Text-Classification Techniques Applied to Arabic Text
    Kanaan, Ghassan
    Al-Shalabi, Riyad
    Ghwanmeh, Sameh
    Al-Ma'adeed, Hamda
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (09): : 1836 - 1844
  • [34] Automated Arabic Text Classification With P-Stemmer, Machine Learning, and a Tailored News Article Taxonomy
    Kanan, Tarek
    Fox, Edward A.
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (11) : 2667 - 2683
  • [35] Compression-Based Arabic Text Classification
    Ta'amneh, Haneen
    Abu Keshek, Ehsan
    Issa, Manar Bani
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 594 - 600
  • [36] Arabic text classification using Polynomial Networks
    Al-Tahrawi, Mayy M.
    Al-Khatib, Sumaya N.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2015, 27 (04) : 437 - 449
  • [37] An Experimental Study for Arabic Text Classification Techniques
    Al-Shargabi, Bassam
    Olayah, Fekry
    FOURTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2012), 2012, 8334
  • [38] The impact of indexing approaches on Arabic text classification
    Al-Badarneh, Amer
    Al-Shawakfa, Emad
    Bani-Ismail, Basel
    Al-Rababah, Khaleel
    Shatnawi, Safwan
    JOURNAL OF INFORMATION SCIENCE, 2017, 43 (02) : 159 - 173
  • [39] Effect of Word Segmentation on Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Al-Subaie, Abdullah
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 127 - 131
  • [40] Arabic Text Classification based on Semantic Relations
    Hijazi, Musab
    Zeki, Akram
    Ismail, Amelia
    INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE, 2022, 17 (02): : 937 - 946