A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION

被引:0
|
作者
El-Alami, Fatima-zahra [1 ]
El Mahdaouy, Abdelkader [1 ]
El Alaoui, Said Ouatik [1 ,2 ]
En-Nahnahi, Noureddine [1 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Lab Informat & Modeling, FSDM, Fes, Morocco
[2] Ibn Tofail Univ, Natl Sch Appl Sci, Kenitra, Morocco
关键词
Arabic text representation; deep autoencoder; feature selection; machine learning; text categorization;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones.
引用
收藏
页码:381 / 398
页数:18
相关论文
共 50 条
  • [31] A Deep Autoencoder-Based Approach for Suspicious Action Recognition in Surveillance Videos
    Ahmed, Waqas
    Yousaf, Muhammad Haroon
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (03) : 3517 - 3532
  • [32] Deep Learning Autoencoder-based Compression for Current Source Model Waveforms
    Raslan, Waseem
    Ismail, Yehea
    2021 28TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS, AND SYSTEMS (IEEE ICECS 2021), 2021,
  • [33] An Autoencoder-based Method for Targeted Attack on Deep Neural Network Models
    Duc-Anh Nguyen
    Do Minh Kha
    Pham Thi To Nga
    Pham Ngoc Hung
    2021 RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF 2021), 2021, : 126 - 131
  • [34] An autoencoder-based deep learning approach for clustering time series data
    Tavakoli, Neda
    Siami-Namini, Sima
    Khanghah, Mahdi Adl
    Soltani, Fahimeh Mirza
    Namin, Akbar Siami
    SN APPLIED SCIENCES, 2020, 2 (05):
  • [35] Autoencoder-Based Collaborative Filtering
    Ouyang, Yuanxin
    Liu, Wenqi
    Rong, Wenge
    Xiong, Zhang
    NEURAL INFORMATION PROCESSING, ICONIP 2014, PT III, 2014, 8836 : 284 - 291
  • [36] Machine learning for Arabic text categorization
    Duwairi, Rehab M.
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (08): : 1005 - 1010
  • [37] An Automated Arabic Text Categorization Based on the Frequency Ratio Accumulation
    Sharef, Baraa
    Omar, Nazlia
    Sharef, Zeyad
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2014, 11 (02) : 213 - 221
  • [38] Short text manifold representation based on AutoEncoder network
    Wei, Chao
    Luo, Sen-Lin
    Zhang, Jing
    Pan, Li-Min
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2015, 49 (08): : 1591 - 1599
  • [39] Deep autoencoder-based fuzzy c-means for topic detection
    Murfi, Hendri
    Rosaline, Natasha
    Hariadi, Nora
    ARRAY, 2022, 13
  • [40] An Autoencoder-Based Image Anonymization Scheme for Privacy Enhanced Deep Learning
    Rodriguez, David
    Krishnan, Ram
    DATA AND APPLICATIONS SECURITY AND PRIVACY XXXVII, DBSEC 2023, 2023, 13942 : 302 - 316