A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION

被引:0
|
作者
El-Alami, Fatima-zahra [1 ]
El Mahdaouy, Abdelkader [1 ]
El Alaoui, Said Ouatik [1 ,2 ]
En-Nahnahi, Noureddine [1 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Lab Informat & Modeling, FSDM, Fes, Morocco
[2] Ibn Tofail Univ, Natl Sch Appl Sci, Kenitra, Morocco
关键词
Arabic text representation; deep autoencoder; feature selection; machine learning; text categorization;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones.
引用
收藏
页码:381 / 398
页数:18
相关论文
共 50 条
  • [41] Food Intake Detection Using Autoencoder-Based Deep Neural Networks
    Turan, M. A. Tugtekin
    Erzin, Engin
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [42] An autoencoder-based deep learning approach for clustering time series data
    Neda Tavakoli
    Sima Siami-Namini
    Mahdi Adl Khanghah
    Fahimeh Mirza Soltani
    Akbar Siami Namin
    SN Applied Sciences, 2020, 2
  • [43] An Autoencoder-Based Deep Learning Approach for Load Identification in Structural Dynamics
    Rosafalco, Luca
    Manzoni, Andrea
    Mariani, Stefano
    Corigliano, Alberto
    SENSORS, 2021, 21 (12)
  • [44] A Deep Autoencoder-Based Approach for Suspicious Action Recognition in Surveillance Videos
    Waqas Ahmed
    Muhammad Haroon Yousaf
    Arabian Journal for Science and Engineering, 2024, 49 : 3517 - 3532
  • [45] A Variational Autoencoder-Based Secure Transceiver Design Using Deep Learning
    Lin, Chia-Hung
    Wu, Chao-Chin
    Chen, Kuan-Fu
    Lee, Ta-Sung
    2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [46] AutoAt: A deep autoencoder-based classification model for supervised authorship attribution
    Briciu, Anamaria
    Czibula, Gabriela
    Lupea, Mihaiela
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 397 - 406
  • [47] An Efficient Autoencoder-based Deep Learning Technique to Detect Network Intrusions
    Haripriya, C.
    Jagadeesh, M. P. Prabhudev
    INTERNATIONAL TRANSACTION JOURNAL OF ENGINEERING MANAGEMENT & APPLIED SCIENCES & TECHNOLOGIES, 2022, 13 (07):
  • [48] Autoencoder-based representation learning and its application in intelligent fault diagnosis: A review
    Yang, Zheng
    Xu, Binbin
    Luo, Wei
    Chen, Fei
    MEASUREMENT, 2022, 189
  • [49] Item Categorization Algorithm Based on Improved Text Representation
    Zhenchao T.
    Jing M.
    Data Analysis and Knowledge Discovery, 2022, 6 (05) : 34 - 43
  • [50] Infection Categorization Using Deep Autoencoder
    Wang, Ming-Hung
    Tsai, Meng-Han
    Yang, Wei-Chieh
    Lei, Chin-Laung
    IEEE INFOCOM 2018 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2018,