A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION

被引:0
|
作者
El-Alami, Fatima-zahra [1 ]
El Mahdaouy, Abdelkader [1 ]
El Alaoui, Said Ouatik [1 ,2 ]
En-Nahnahi, Noureddine [1 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Lab Informat & Modeling, FSDM, Fes, Morocco
[2] Ibn Tofail Univ, Natl Sch Appl Sci, Kenitra, Morocco
关键词
Arabic text representation; deep autoencoder; feature selection; machine learning; text categorization;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones.
引用
收藏
页码:381 / 398
页数:18
相关论文
共 50 条
  • [21] Autoencoder-based deep metric learning for network intrusion detection
    Andresini, Giuseppina
    Appice, Annalisa
    Malerba, Donato
    INFORMATION SCIENCES, 2021, 569 (569) : 706 - 727
  • [22] Autoencoder-Based Latent Block-Diagonal Representation for Subspace Clustering
    Xu, Yesong
    Chen, Shuo
    Li, Jun
    Han, Zongyan
    Yang, Jian
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) : 5408 - 5418
  • [23] An autoencoder-based representation for noise reduction in distant supervision of relation extraction
    Garcia-Mendoza, Juan-Luis
    Villasenor-Pineda, Luis
    Orihuela-Espina, Felipe
    Bustio-Martinez, Lazaro
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4523 - 4529
  • [24] An Autoencoder-Based Deep Learning Classifier for Efficient Diagnosis of Autism
    Sewani, Harshini
    Kashef, Rasha
    CHILDREN-BASEL, 2020, 7 (10):
  • [25] A fuzzy-based approach for text representation in text categorization
    Doan, S
    FUZZ-IEEE 2005: PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS: BIGGEST LITTLE CONFERENCE IN THE WORLD, 2005, : 1008 - 1013
  • [26] An enhanced short text categorization model with deep abundant representation
    Gu, Yanhui
    Gu, Min
    Long, Yi
    Xu, Guandong
    Yang, Zhenglu
    Zhou, Junsheng
    Qu, Weiguang
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2018, 21 (06): : 1705 - 1719
  • [27] An enhanced short text categorization model with deep abundant representation
    Yanhui Gu
    Min Gu
    Yi Long
    Guandong Xu
    Zhenglu Yang
    Junsheng Zhou
    Weiguang Qu
    World Wide Web, 2018, 21 : 1705 - 1719
  • [28] A Fast Autoencoder-based Recommender
    Jiang, Jiajia
    Xia, Yunni
    Shang, Mingsheng
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 1732 - 1737
  • [29] Configurable Text-based Image Editing by Autoencoder-based Generative Adversarial Networks
    Wu F.-X.
    Cheng J.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (09):
  • [30] Autoencoder-based Image Companding
    Wicaksono, Alim H. P.
    Prasetyo, Heri
    Guo, Jing-Ming
    2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,