Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning

被引：2

作者：

Ababneh A.H. ^{[1
]}

机构：

[1] Computer Science Department, American University of Madaba, Madaba

来源：

Journal of Electronic Science and Technology | 2022年 / 20卷 / 02期

关键词：

K-nearest neighbor (knn); Logistic regression (lr); Naive bayes (nb); Random forest (rf); Support vector machine (svm); Text classification (tc);

D O I：

10.1016/j.jnlest.2022.100160

中图分类号：

学科分类号：

摘要：

Training and testing different models in the field of text classification mainly depend on the pre-classified text document datasets. Recently, seven datasets have emerged for Arabic text classification, including Single-Label Arabic News Articles Dataset (SANAD), Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004, and Khaleej2004. This study investigates which of these datasets can provide significant training and fair evaluation for text classification (TC). In this investigation, well-known and accurate learning models are used, including naive Bayes (NB), random forest (RF), K-nearest neighbor (KNN), support vector machines (SVM), and logistic regression (LR) models. We present relevance and time measures of training the models with these datasets to enable Arabic language researchers to select the appropriate dataset to use based on a solid basis of comparison. The performances of the five learning models across the seven datasets are measured and compared with the performances of the same models trained on a well-known English language dataset. The analysis of the relevance and time scores shows that training the SVM model on Khaleej and Arabiya obtained the most significant results in the shortest amount of time, with the accuracy of 82%. © 2022, Journal of Electronic Science and Technology. All Rights Reserved.

引用

页码：187 / 208

页数：21

共 50 条

[21] VDCL : A supervised text classification method based on virtual adversarial and contrast learning
Dou, Ximeng
Zhao, Jing
Li, Ming
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[22] Short text classification algorithm based on semi-supervised learning and SVM
Yin, Chunyong
Xiang, Jun
Zhang, Hui
Yin, Zhichao
Wang, Jin
International Journal of Multimedia and Ubiquitous Engineering, 2015, 10 (12): : 195 - 206
[23] Supervised Learning in the Wild: Text Classification for Critical Technologies
Maiya, Arun S.
Loaiza-Lemos, Francisco
Rolfe, Robert M.
2012 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2012), 2012,
[24] A review of semi-supervised learning for text classification
José Marcio Duarte
Lilian Berton
Artificial Intelligence Review, 2023, 56 : 9401 - 9469
[25] A review of semi-supervised learning for text classification
Duarte, Jose Marcio
Berton, Lilian
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (09) : 9401 - 9469
[26] Semi-Supervised Text Classification With Universum Learning
Liu, Chien-Liang
Hsaio, Wen-Hoar
Lee, Chia-Hoang
Chang, Tao-Hsing
Kuo, Tsung-Hsun
IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (02) : 462 - 473
[27] Enhanced Arabic information retrieval system based on Arabic text classification
Ghwanmeh, Sameh
Kanaan, Ghassan
Al-Shalabi, Riyad
Ababneh, Ahmad
2007 INNOVATIONS IN INFORMATION TECHNOLOGIES, VOLS 1 AND 2, 2007, : 527 - +
[28] Scalable Arabic text Classification Using Machine Learning Model
Al Mgheed, Rahaf M.
2021 12TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2021, : 483 - 485
[29] Subsequence Kernels-Based Arabic Text Classification
Nehar, Attia
Benmessaoud, Abdelkader
Cherroun, Hadda
Ziadi, Djelloul
2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 206 - 213
[30] Arabic Text Mining Using Rule Based Classification
Thabtah, Fadi
Gharaibeh, Omar
Al-Zubaidy, Rashid
JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2012, 11 (01)

← 1 2 3 4 5 →