A Superior Arabic Text Categorization Deep Model (SATCDM)

被引:23
|
作者
Alhawarat, M. [1 ]
Aseeri, Ahmad O. [1 ]
机构
[1] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, Dept Comp Sci, Al Kharj 11942, Saudi Arabia
关键词
Documents classification; deep learning; Arabic language; convolutional neural networks; word embedding; skip-gram; word2vec; CLASSIFICATION;
D O I
10.1109/ACCESS.2020.2970504
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Categorizing Arabic text documents is considered an important research topic in the field of Natural Language Processing (NLP) and Machine Learning (ML). The number of Arabic documents is tremendously increasing daily as new web pages, news articles, social media contents are added. Hence, classifying such documents in specific classes is of high importance to many people and applications. Convolutional Neural Network (CNN) is a class of deep learning that has been shown to be useful for many NLP tasks, including text translation and text categorization for the English language. Word embedding is a text representation currently used to represent text terms as real-valued vectors in vector space that represent both syntactic and semantic traits of text. Current research studies in classifying Arabic text documents use traditional text representation such as bag-of-words and TF-IDF weighting, but few use word embedding. Traditional ML algorithms have already been used in Arabic text categorization, and good results are achieved. In this study, we present a Multi-Kernel CNN model for classifying Arabic news documents enriched with n-gram word embedding, which we call A Superior Arabic Text Categorization Deep Model (SATCDM). The proposed solution achieves very high accuracy compared to current research in Arabic text categorization using 15 of freely available datasets. The model achieves an accuracy ranging from 97.58% to 99.90%, which is superior to similar studies on the Arabic document classification task.
引用
收藏
页码:24653 / 24661
页数:9
相关论文
共 50 条
  • [41] Anti-Islamic Arabic Text Categorization using Text Mining and Sentiment Analysis Techniques
    Alraddadi, Rawan Abdullah
    Ghembaza, Moulay Ibrahim El-Khalil
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 776 - 785
  • [42] Deep FCN for Arabic Scene Text Detection
    Beltaief, Ines
    Ben Halima, Mohamed
    2018 IEEE 2ND INTERNATIONAL WORKSHOP ON ARABIC AND DERIVED SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2018, : 129 - 134
  • [43] A Deep Learning Approach for Arabic Text Classification
    Sundus, Katrina
    Al-Haj, Fatima
    Hammo, Bassam
    2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 258 - 264
  • [44] Smoothing LDA model for text categorization
    Li, Wenbo
    Sun, Le
    Feng, Yuanyong
    Zhang, Dakun
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 83 - +
  • [45] Text Categorization Based on Topic Model
    School of Computer Science and Technology, China University of Mining and Technology, Jiangsu Province, Xuzhou
    221116, China
    不详
    100081, China
    Int. J. Comput. Intell. Syst., 2009, 4 (398-409): : 398 - 409
  • [46] Text Categorization Based on Topic Model
    Zhou, Shibin
    Li, Kan
    Liu, Yushu
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2009, 2 (04) : 398 - 409
  • [47] An Adaptive Markov Model for Text Categorization
    Li, Jin
    Yue, Kun
    Liu, Weiyi
    2008 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2008, : 802 - +
  • [48] Text categorization based on topic model
    Zhou, Shibin
    Li, Kan
    Liu, Yushu
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, 2008, 5009 : 572 - 579
  • [49] Weighted kernel model for text categorization
    Faculty of Information Technology, University of Technology, Sydney, PO Box 123, Broadway NSW 2007, Australia
    Conf. Res. Pract. Inf. Technol. Ser., 2006, (111-114):
  • [50] An Overview of Unsupervised Deep Feature Representation for Text Categorization
    Wang, Shiping
    Cai, Jinyu
    Lin, Qihao
    Guo, Wenzhong
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2019, 6 (03) : 504 - 517