Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning

被引:2
|
作者
Ababneh A.H. [1 ]
机构
[1] Computer Science Department, American University of Madaba, Madaba
关键词
K-nearest neighbor (knn); Logistic regression (lr); Naive bayes (nb); Random forest (rf); Support vector machine (svm); Text classification (tc);
D O I
10.1016/j.jnlest.2022.100160
中图分类号
学科分类号
摘要
Training and testing different models in the field of text classification mainly depend on the pre-classified text document datasets. Recently, seven datasets have emerged for Arabic text classification, including Single-Label Arabic News Articles Dataset (SANAD), Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004, and Khaleej2004. This study investigates which of these datasets can provide significant training and fair evaluation for text classification (TC). In this investigation, well-known and accurate learning models are used, including naive Bayes (NB), random forest (RF), K-nearest neighbor (KNN), support vector machines (SVM), and logistic regression (LR) models. We present relevance and time measures of training the models with these datasets to enable Arabic language researchers to select the appropriate dataset to use based on a solid basis of comparison. The performances of the five learning models across the seven datasets are measured and compared with the performances of the same models trained on a well-known English language dataset. The analysis of the relevance and time scores shows that training the SVM model on Khaleej and Arabiya obtained the most significant results in the shortest amount of time, with the accuracy of 82%. © 2022, Journal of Electronic Science and Technology. All Rights Reserved.
引用
收藏
页码:187 / 208
页数:21
相关论文
共 50 条
  • [21] VDCL : A supervised text classification method based on virtual adversarial and contrast learning
    Dou, Ximeng
    Zhao, Jing
    Li, Ming
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [22] Short text classification algorithm based on semi-supervised learning and SVM
    Yin, Chunyong
    Xiang, Jun
    Zhang, Hui
    Yin, Zhichao
    Wang, Jin
    International Journal of Multimedia and Ubiquitous Engineering, 2015, 10 (12): : 195 - 206
  • [23] Supervised Learning in the Wild: Text Classification for Critical Technologies
    Maiya, Arun S.
    Loaiza-Lemos, Francisco
    Rolfe, Robert M.
    2012 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2012), 2012,
  • [24] A review of semi-supervised learning for text classification
    José Marcio Duarte
    Lilian Berton
    Artificial Intelligence Review, 2023, 56 : 9401 - 9469
  • [25] A review of semi-supervised learning for text classification
    Duarte, Jose Marcio
    Berton, Lilian
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (09) : 9401 - 9469
  • [26] Semi-Supervised Text Classification With Universum Learning
    Liu, Chien-Liang
    Hsaio, Wen-Hoar
    Lee, Chia-Hoang
    Chang, Tao-Hsing
    Kuo, Tsung-Hsun
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (02) : 462 - 473
  • [27] Enhanced Arabic information retrieval system based on Arabic text classification
    Ghwanmeh, Sameh
    Kanaan, Ghassan
    Al-Shalabi, Riyad
    Ababneh, Ahmad
    2007 INNOVATIONS IN INFORMATION TECHNOLOGIES, VOLS 1 AND 2, 2007, : 527 - +
  • [28] Scalable Arabic text Classification Using Machine Learning Model
    Al Mgheed, Rahaf M.
    2021 12TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2021, : 483 - 485
  • [29] Subsequence Kernels-Based Arabic Text Classification
    Nehar, Attia
    Benmessaoud, Abdelkader
    Cherroun, Hadda
    Ziadi, Djelloul
    2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 206 - 213
  • [30] Arabic Text Mining Using Rule Based Classification
    Thabtah, Fadi
    Gharaibeh, Omar
    Al-Zubaidy, Rashid
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2012, 11 (01)