A Superior Arabic Text Categorization Deep Model (SATCDM)

被引:23
|
作者
Alhawarat, M. [1 ]
Aseeri, Ahmad O. [1 ]
机构
[1] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, Dept Comp Sci, Al Kharj 11942, Saudi Arabia
关键词
Documents classification; deep learning; Arabic language; convolutional neural networks; word embedding; skip-gram; word2vec; CLASSIFICATION;
D O I
10.1109/ACCESS.2020.2970504
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Categorizing Arabic text documents is considered an important research topic in the field of Natural Language Processing (NLP) and Machine Learning (ML). The number of Arabic documents is tremendously increasing daily as new web pages, news articles, social media contents are added. Hence, classifying such documents in specific classes is of high importance to many people and applications. Convolutional Neural Network (CNN) is a class of deep learning that has been shown to be useful for many NLP tasks, including text translation and text categorization for the English language. Word embedding is a text representation currently used to represent text terms as real-valued vectors in vector space that represent both syntactic and semantic traits of text. Current research studies in classifying Arabic text documents use traditional text representation such as bag-of-words and TF-IDF weighting, but few use word embedding. Traditional ML algorithms have already been used in Arabic text categorization, and good results are achieved. In this study, we present a Multi-Kernel CNN model for classifying Arabic news documents enriched with n-gram word embedding, which we call A Superior Arabic Text Categorization Deep Model (SATCDM). The proposed solution achieves very high accuracy compared to current research in Arabic text categorization using 15 of freely available datasets. The model achieves an accuracy ranging from 97.58% to 99.90%, which is superior to similar studies on the Arabic document classification task.
引用
收藏
页码:24653 / 24661
页数:9
相关论文
共 50 条
  • [31] Enhanced Filter Feature Selection Methods for Arabic Text Categorization
    Ghareb, Abdullah Saeed
    Abu Bakara, Azuraliza
    Al-Radaideh, Qasem A.
    Hamdan, Abdul Razak
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2018, 8 (02) : 1 - 24
  • [32] Coordinate Model for Text Categorization
    Jiang, Wei
    Chen, Lei
    TRANSACTIONS ON EDUTAINMENT V, 2011, 6530 : 214 - 223
  • [33] Word Sense Representation based-method for Arabic Text Categorization
    El-Alami, Fatima-Zahra
    Ouatik El Alaoui, Said
    9TH INTERNATIONAL SYMPOSIUM ON SIGNAL, IMAGE, VIDEO AND COMMUNICATIONS (ISIVC 2018), 2018, : 141 - 146
  • [34] An Arabic text categorization approach using term weighting and multiple reducts
    Al-Radaideh, Qasem A.
    Al-Abrat, Mohammed A.
    SOFT COMPUTING, 2019, 23 (14) : 5849 - 5863
  • [35] A Comparative Study of Statistical Feature Reduction Methods for Arabic Text Categorization
    Harrag, Fouzi
    El-Qawasmeh, Eyas
    Al-Salman, Abdul Malik S.
    NETWORKED DIGITAL TECHNOLOGIES, PT 2, 2010, 88 : 676 - +
  • [36] Robust Arabic Text Categorization by Combining Convolutional and Recurrent Neural Networks
    Ameur, Mohamed Seghir Hadj
    Belkebir, Riadh
    Guessoum, Ahmed
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (05)
  • [37] An Arabic text categorization approach using term weighting and multiple reducts
    Qasem A. Al-Radaideh
    Mohammed A. Al-Abrat
    Soft Computing, 2019, 23 : 5849 - 5863
  • [38] Arabic Text Categorization Using SVM Active Learning Technique : An Overview
    Goudjil, Mohamed
    Koudil, Mouloud
    Hammami, Nacereddine
    Bedda, Mouldi
    Alruily, Meshrif
    WORLD CONGRESS ON COMPUTER & INFORMATION TECHNOLOGY (WCCIT 2013), 2013,
  • [39] Attention-Based Deep Learning Model for Arabic Handwritten Text Recognition
    Gader T.B.A.
    Echi A.K.
    Machine Graphics and Vision, 2022, 31 (1-4): : 49 - 73
  • [40] Hybrid deep learning model for Arabic text classification based on mutual information
    Abdulghani, Farah A.
    Abdullah, Nada A. Z.
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2022, 43 (08): : 1901 - 1908