Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning

被引：2

作者：

Ababneh A.H. ^{[1
]}

机构：

[1] Computer Science Department, American University of Madaba, Madaba

来源：

Journal of Electronic Science and Technology | 2022年 / 20卷 / 02期

关键词：

K-nearest neighbor (knn); Logistic regression (lr); Naive bayes (nb); Random forest (rf); Support vector machine (svm); Text classification (tc);

D O I：

10.1016/j.jnlest.2022.100160

中图分类号：

学科分类号：

摘要：

Training and testing different models in the field of text classification mainly depend on the pre-classified text document datasets. Recently, seven datasets have emerged for Arabic text classification, including Single-Label Arabic News Articles Dataset (SANAD), Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004, and Khaleej2004. This study investigates which of these datasets can provide significant training and fair evaluation for text classification (TC). In this investigation, well-known and accurate learning models are used, including naive Bayes (NB), random forest (RF), K-nearest neighbor (KNN), support vector machines (SVM), and logistic regression (LR) models. We present relevance and time measures of training the models with these datasets to enable Arabic language researchers to select the appropriate dataset to use based on a solid basis of comparison. The performances of the five learning models across the seven datasets are measured and compared with the performances of the same models trained on a well-known English language dataset. The analysis of the relevance and time scores shows that training the SVM model on Khaleej and Arabiya obtained the most significant results in the shortest amount of time, with the accuracy of 82%. © 2022, Journal of Electronic Science and Technology. All Rights Reserved.

引用

页码：187 / 208

页数：21

共 50 条

[41] Survey on supervised machine learning techniques for automatic text classification
Ammar Ismael Kadhim
Artificial Intelligence Review, 2019, 52 : 273 - 292
[42] SEMI-SUPERVISED LEARNING FOR TEXT CLASSIFICATION BY LAYER PARTITIONING
Li, Alexander Hanbo
Sethy, Abhinav
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6164 - 6168
[43] Investigating self-supervised learning for Skin Lesion Classification
Morita, Takumi
Han, Xian-Hua
2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
[44] Remote Sensing Image Scene Classification with Self-Supervised Learning Based on Partially Unlabeled Datasets
Chen, Xiliang
Zhu, Guobin
Liu, Mingqing
REMOTE SENSING, 2022, 14 (22)
[45] Utilizing Deep Learning in Arabic Text Classification Sentiment Analysis of Twitter
Ibrahim, Nehad M.
Yafooz, Wael M. S.
Emara, Abdel-Hamid M.
Abdel-Wahab, Ahmed
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 830 - 838
[46] Supervised Variational Relevance Learning, An Analytic Geometric Feature Selection with Applications to Omic Datasets
Boareto, Marcelo
Cesar, Jonatas
Leite, Vitor B. P.
Caticha, Nestor
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (03) : 705 - 711
[47] Combining active learning and relevance vector machines for text classification
Silva, C.
Ribeiro, B.
ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2007, : 130 - +
[48] Abstractive Arabic Text Summarization Based on Deep Learning
Wazery, Y. M.
Saleh, Marwa E.
Alharbi, Abdullah
Ali, Abdelmgeid A.
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[49] Arabic Text Steganography Based on Deep Learning Methods
Adeeb, Omer Farooq Ahmed
Kabudian, Seyed Jahanshah
IEEE ACCESS, 2022, 10 : 94403 - 94416
[50] Firefly Algorithm based Feature Selection for Arabic Text Classification
Marie-Sainte, Souad Larabi
Alalyani, Nada
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2020, 32 (03) : 320 - 328

← 1 2 3 4 5 →