Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning

被引:2
|
作者
Ababneh A.H. [1 ]
机构
[1] Computer Science Department, American University of Madaba, Madaba
关键词
K-nearest neighbor (knn); Logistic regression (lr); Naive bayes (nb); Random forest (rf); Support vector machine (svm); Text classification (tc);
D O I
10.1016/j.jnlest.2022.100160
中图分类号
学科分类号
摘要
Training and testing different models in the field of text classification mainly depend on the pre-classified text document datasets. Recently, seven datasets have emerged for Arabic text classification, including Single-Label Arabic News Articles Dataset (SANAD), Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004, and Khaleej2004. This study investigates which of these datasets can provide significant training and fair evaluation for text classification (TC). In this investigation, well-known and accurate learning models are used, including naive Bayes (NB), random forest (RF), K-nearest neighbor (KNN), support vector machines (SVM), and logistic regression (LR) models. We present relevance and time measures of training the models with these datasets to enable Arabic language researchers to select the appropriate dataset to use based on a solid basis of comparison. The performances of the five learning models across the seven datasets are measured and compared with the performances of the same models trained on a well-known English language dataset. The analysis of the relevance and time scores shows that training the SVM model on Khaleej and Arabiya obtained the most significant results in the shortest amount of time, with the accuracy of 82%. © 2022, Journal of Electronic Science and Technology. All Rights Reserved.
引用
收藏
页码:187 / 208
页数:21
相关论文
共 50 条
  • [31] Arabic Text Classification Based on Word and Document Embeddings
    El Mahdaouy, Abdelkader
    Gaussier, Eric
    El Alaoui, Said Ouatik
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 32 - 41
  • [32] A New SVM Method for Short Text Classification Based on Semi-Supervised Learning
    Yin, Chunyong
    Xiang, Jun
    Zhang, Hui
    Wang, Jin
    Yin, Zhichao
    Kim, Jeong-Uk
    2015 4TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION TECHNOLOGY AND SENSOR APPLICATION (AITS), 2015, : 100 - 103
  • [33] Rough set and ensemble learning based semi-supervised algorithm for text classification
    Shi, Lei
    Ma, Xinming
    Xi, Lei
    Duan, Qiguo
    Zhao, Jingying
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 6300 - 6306
  • [34] Classification of Cyberbullying Text in Arabic
    Rachid, Benaissa Azzeddine
    Azza, Harbaoui
    Ben Ghezala, Hajjami Henda
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [35] Performance Analysis of Supervised Machine Learning Algorithms for Text Classification
    Mishu, Sadia Zaman
    Rafiuddin, S. M.
    PROCEEDINGS OF THE 2016 19TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2016, : 409 - 413
  • [36] Amharic Text Complexity Classification Using Supervised Machine Learning
    Nigusie, Gebregziabihier
    Tegegne, Tesfa
    ARTIFICIAL INTELLIGENCE AND DIGITALIZATION FOR SUSTAINABLE DEVELOPMENT, ICAST 2022, 2023, 455 : 1 - 12
  • [37] Text Message Classification Using Supervised Machine Learning Algorithms
    Merugu, Suresh
    Reddy, M. Chandra Shekhar
    Goyal, Ekansh
    Piplani, Lakshay
    ICCCE 2018, 2019, 500 : 141 - 150
  • [38] Imbalanced Classification Algorithm for Semi Supervised Text Learning (iCASSTLE)
    Banerjee, Debanjana
    Prabhat, Gyan
    Bhowal, Riyanka
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1012 - 1016
  • [39] Survey on supervised machine learning techniques for automatic text classification
    Kadhim, Ammar Ismael
    ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (01) : 273 - 292
  • [40] Deep Text Prior: Weakly Supervised Learning for Assertion Classification
    Liventsev, Vadim
    Fedulova, Irina
    Dylov, Dmitry
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 243 - 257