Denoising Autoencoder as an Effective Dimensionality Reduction and Clustering of Text Data

被引:12
|
作者
Leyli-Abadi, Milad [1 ]
Labiod, Lazhar [1 ]
Nadif, Mohamed [1 ]
机构
[1] Paris Descartes Univ, LIPADE, F-75006 Paris, France
关键词
Auto-encoder; Deep learning; Cosine similarity; Neighborhood; Document clustering; Unsupervised learning; Dimensionality reduction; FRAMEWORK;
D O I
10.1007/978-3-319-57529-2_62
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning methods are widely used in vision and face recognition, however there is a real lack of application of such methods in the field of text data. In this context, the data is often represented by a sparse high dimensional document-term matrix. Dealing with such data matrices, we present, in this paper, a new denoising auto-encoder for dimensionality reduction, where each document is not only affected by its own information, but also affected by the information from its neighbors according to the cosine similarity measure. It turns out that the proposed auto-encoder can discover the low dimensional embeddings, and as a result reveal the underlying effective manifold structure. The visual representation of these embeddings suggests the suitability of performing the clustering on the set of documents relying on the Expectation-Maximization algorithm for Gaussian mixture models. On real-world datasets, the relevance of the presented auto-encoder in the visualisation and document clustering field is shown by a comparison with five widely used unsupervised dimensionality reduction methods including the classic auto-encoder.
引用
收藏
页码:801 / 813
页数:13
相关论文
共 50 条
  • [21] Dimensionality Reduction for Clustering of Nonlinear Industrial Data: A Tutorial
    Roh, Hae Rang
    Kim, Chae Sun
    Lee, Yongseok
    Lee, Jong Min
    KOREAN JOURNAL OF CHEMICAL ENGINEERING, 2025, : 987 - 1001
  • [22] Distributed dimensionality reduction of industrial data based on clustering
    Zhang, Yongyan
    Xie, Guo
    Wang, Wenqing
    Wang, Xiaofan
    Qian, Fucai
    Du, Xulong
    Du, Jinhua
    PROCEEDINGS OF THE 2018 13TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2018), 2018, : 370 - 374
  • [23] Image noise reduction by denoising autoencoder
    Yasenko, Lev
    Klyatchenko, Yaroslav
    Tarasenko-Klyatchenko, Oksana
    2020 IEEE 11TH INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS, SERVICES AND TECHNOLOGIES (DESSERT): IOT, BIG DATA AND AI FOR A SAFE & SECURE WORLD AND INDUSTRY 4.0, 2020, : 351 - 355
  • [24] An effective dimension reduction algorithm for clustering Arabic text
    Mohamed, A. A.
    EGYPTIAN INFORMATICS JOURNAL, 2020, 21 (01) : 1 - 5
  • [25] Reduction of Dimensionality in Structured Data Sets on Clustering Efficiency in Data Mining
    Pasha, Noor
    Ashokkumar, P. S.
    Venkatesh, P.
    Krishna, Gopal C.
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2017, : 1020 - 1023
  • [26] Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging
    Zabalza, Jaime
    Ren, Jinchang
    Zheng, Jiangbin
    Zhao, Huimin
    Qing, Chunmei
    Yang, Zhijing
    Du, Peijun
    Marshall, Stephen
    NEUROCOMPUTING, 2016, 185 : 1 - 10
  • [27] A Distributed Framework for Dimensionality Reduction and Denoising
    Schizas, Ioannis D.
    Aduroja, Abiodun
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2015, 63 (23) : 6379 - 6394
  • [28] Hyperspectral Data Dimensionality Reduction: A Comparative Study Between PCA and Autoencoder Methods
    Motsch, Jean
    Bergeon, Yves
    Ktivanek, Vaclav
    MODELLING AND SIMULATION FOR AUTONOMOUS SYSTEMS, MESAS 2023, 2025, 14615 : 314 - 334
  • [29] High-Dimensional Text Clustering by Dimensionality Reduction and Improved Density Peak
    Sun, Yujia
    Platos, Jan
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020 (2020):
  • [30] Using an Autoencoder for Dimensionality Reduction in Quantum Dynamics
    Reiter, Sebastian
    Schnappinger, Thomas
    de Vivie-Riedle, Regina
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 783 - 787