Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning

被引:2
|
作者
Ababneh A.H. [1 ]
机构
[1] Computer Science Department, American University of Madaba, Madaba
关键词
K-nearest neighbor (knn); Logistic regression (lr); Naive bayes (nb); Random forest (rf); Support vector machine (svm); Text classification (tc);
D O I
10.1016/j.jnlest.2022.100160
中图分类号
学科分类号
摘要
Training and testing different models in the field of text classification mainly depend on the pre-classified text document datasets. Recently, seven datasets have emerged for Arabic text classification, including Single-Label Arabic News Articles Dataset (SANAD), Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004, and Khaleej2004. This study investigates which of these datasets can provide significant training and fair evaluation for text classification (TC). In this investigation, well-known and accurate learning models are used, including naive Bayes (NB), random forest (RF), K-nearest neighbor (KNN), support vector machines (SVM), and logistic regression (LR) models. We present relevance and time measures of training the models with these datasets to enable Arabic language researchers to select the appropriate dataset to use based on a solid basis of comparison. The performances of the five learning models across the seven datasets are measured and compared with the performances of the same models trained on a well-known English language dataset. The analysis of the relevance and time scores shows that training the SVM model on Khaleej and Arabiya obtained the most significant results in the shortest amount of time, with the accuracy of 82%. © 2022, Journal of Electronic Science and Technology. All Rights Reserved.
引用
收藏
页码:187 / 208
页数:21
相关论文
共 50 条
  • [1] Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning
    Ahmad Hussein Ababneh
    Journal of Electronic Science and Technology, 2022, 20 (02) : 187 - 208
  • [2] Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning
    Ahmad Hussein Ababneh
    Journal of Electronic Science and Technology, 2022, (02) : 187 - 208
  • [3] TEXT CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING
    Vo Duy Thanh
    Vo Trung Hung
    Pham Minh Tuan
    Doan Van Ban
    2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 232 - 236
  • [4] A Proposed Deep Learning based Framework for Arabic Text Classification
    Sayed, Mostafa
    Abdelkader, Hatem
    Khedr, Ayman E.
    Salem, Rashed
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 305 - 313
  • [5] A Deep Learning Approach for Arabic Text Classification
    Sundus, Katrina
    Al-Haj, Fatima
    Hammo, Bassam
    2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 258 - 264
  • [6] Multi-Label Arabic Text Classification Based On Deep Learning
    Alsukhni, Batool
    2021 12TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2021, : 475 - 477
  • [7] Investigating the Impact of Preprocessing Techniques and Representation Models on Arabic Text Classification using Machine Learning
    Masadeh, Mahmoud
    Moustapha, A.
    Sharada, B.
    Hanumanthappa, J.
    Hemachandran, K.
    Chola, Channabasava
    Muaad, Abdullah Y.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 1115 - 1123
  • [8] Arabic Text Classification of News Articles Using Classical Supervised Classifiers
    Al Qadi, Leen
    El Rifai, Hozayfa
    Obaid, Safa
    Elnagar, Ashraf
    2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 238 - 243
  • [9] RTextTools: A Supervised Learning Package for Text Classification
    Jurka, Timothy P.
    Collingwood, Loren
    Boydstun, Amber E.
    Grossman, Emiliano
    van Atteveldt, Wouter
    R JOURNAL, 2013, 5 (01): : 6 - 12
  • [10] Supervised Contrast Learning Text Classification Model Based on DataQuality Augmentation
    Wu, Liang
    Zhang, Fangfang
    Cheng, Chao
    Song, Shinan
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (05)