A Superior Arabic Text Categorization Deep Model (SATCDM)

被引:23
|
作者
Alhawarat, M. [1 ]
Aseeri, Ahmad O. [1 ]
机构
[1] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, Dept Comp Sci, Al Kharj 11942, Saudi Arabia
关键词
Documents classification; deep learning; Arabic language; convolutional neural networks; word embedding; skip-gram; word2vec; CLASSIFICATION;
D O I
10.1109/ACCESS.2020.2970504
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Categorizing Arabic text documents is considered an important research topic in the field of Natural Language Processing (NLP) and Machine Learning (ML). The number of Arabic documents is tremendously increasing daily as new web pages, news articles, social media contents are added. Hence, classifying such documents in specific classes is of high importance to many people and applications. Convolutional Neural Network (CNN) is a class of deep learning that has been shown to be useful for many NLP tasks, including text translation and text categorization for the English language. Word embedding is a text representation currently used to represent text terms as real-valued vectors in vector space that represent both syntactic and semantic traits of text. Current research studies in classifying Arabic text documents use traditional text representation such as bag-of-words and TF-IDF weighting, but few use word embedding. Traditional ML algorithms have already been used in Arabic text categorization, and good results are achieved. In this study, we present a Multi-Kernel CNN model for classifying Arabic news documents enriched with n-gram word embedding, which we call A Superior Arabic Text Categorization Deep Model (SATCDM). The proposed solution achieves very high accuracy compared to current research in Arabic text categorization using 15 of freely available datasets. The model achieves an accuracy ranging from 97.58% to 99.90%, which is superior to similar studies on the Arabic document classification task.
引用
收藏
页码:24653 / 24661
页数:9
相关论文
共 50 条
  • [1] Deep Neural Models and Retrofitting for Arabic Text Categorization
    El-Alami, Fatima-Zahra
    El Alaoui, Said Ouatik
    En-Nahnahi, Noureddine
    INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2020, 16 (02) : 74 - 86
  • [2] Combining and Merging Deep Neural Networks for Arabic Text Categorization
    El-Alami, Fatima-Zahra
    El Alaoui, Said Ouatik
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 1, 2022, 1417 : 338 - 347
  • [3] A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION
    El-Alami, Fatima-zahra
    El Mahdaouy, Abdelkader
    El Alaoui, Said Ouatik
    En-Nahnahi, Noureddine
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2020, 19 (03): : 381 - 398
  • [4] Arabic text categorization based on arabic wikipedia
    Yahya, A. (yahya@birzeit.edu), 1600, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (13):
  • [5] Machine learning for Arabic text categorization
    Duwairi, Rehab M.
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (08): : 1005 - 1010
  • [6] Contextual Text Categorization: An Improved Stemming Algorithm to Increase the Quality of Categorization in Arabic Text
    Gadri, Said
    Moussaoui, Abdelouahab
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (06) : 835 - 841
  • [7] Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization
    Almuzaini, Huda Abdulrahman
    Azmi, Aqil M.
    IEEE ACCESS, 2020, 8 : 127913 - 127928
  • [8] Feature Reduction Techniques for Arabic Text Categorization
    Duwairi, Rehab
    Al-Refai, Mohammad Nayef
    Khasawneh, Natheer
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (11): : 2347 - 2352
  • [9] Word Sense Disambiguation for Arabic Text Categorization
    Hadni, Meryeme
    El Alaoui, Said
    Lachkar, Abdelmonaime
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2016, 13 (1A) : 215 - 222
  • [10] Neural Networks for the Automation of Arabic Text Categorization
    AlSaleem, Saleh M.
    2013 INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS TECHNOLOGY (ICCAT), 2013,