Denoising Autoencoder as an Effective Dimensionality Reduction and Clustering of Text Data

被引:12
|
作者
Leyli-Abadi, Milad [1 ]
Labiod, Lazhar [1 ]
Nadif, Mohamed [1 ]
机构
[1] Paris Descartes Univ, LIPADE, F-75006 Paris, France
关键词
Auto-encoder; Deep learning; Cosine similarity; Neighborhood; Document clustering; Unsupervised learning; Dimensionality reduction; FRAMEWORK;
D O I
10.1007/978-3-319-57529-2_62
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning methods are widely used in vision and face recognition, however there is a real lack of application of such methods in the field of text data. In this context, the data is often represented by a sparse high dimensional document-term matrix. Dealing with such data matrices, we present, in this paper, a new denoising auto-encoder for dimensionality reduction, where each document is not only affected by its own information, but also affected by the information from its neighbors according to the cosine similarity measure. It turns out that the proposed auto-encoder can discover the low dimensional embeddings, and as a result reveal the underlying effective manifold structure. The visual representation of these embeddings suggests the suitability of performing the clustering on the set of documents relying on the Expectation-Maximization algorithm for Gaussian mixture models. On real-world datasets, the relevance of the presented auto-encoder in the visualisation and document clustering field is shown by a comparison with five widely used unsupervised dimensionality reduction methods including the classic auto-encoder.
引用
收藏
页码:801 / 813
页数:13
相关论文
共 50 条
  • [31] AdaCLV for interpretable variable clustering and dimensionality reduction of spectroscopic data
    Marion, Rebecca
    Govaerts, Bernadette
    von Sachs, Rainer
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2020, 206
  • [32] Manifold Learning for Dimensionality Reduction and Clustering of Skin Spectroscopy Data
    Safi, Asad
    Castaneda, Victor
    Lasser, Tobias
    Mateus, Diana C.
    Navab, Nassir
    MEDICAL IMAGING 2011: COMPUTER-AIDED DIAGNOSIS, 2011, 7963
  • [33] Supervised and Unsupervised Clustering Based Dimensionality Reduction of Hyperspectral Data
    Beirami, B. A.
    Mokhtarzade, M.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2021, 34 (06): : 1407 - 1412
  • [34] Feature Dimensionality Reduction for Visualization and Clustering on Learning Process Data
    Supianto, Ahmad Afif
    Christyawan, Tomi Yahya
    Hafis, Muhammad
    Hayashi, Yusuke
    Hirashima, Tsukasa
    Hasanah, Nur
    PROCEEDINGS OF 2019 4TH INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET 2019), 2019, : 84 - 89
  • [35] Clustering and Dimensionality Reduction to Discover Interesting Patterns in Binary Data
    Palumbo, Francesco
    D'Enza, Alfonso Iodice
    ADVANCES IN DATA ANALYSIS, DATA HANDLING AND BUSINESS INTELLIGENCE, 2010, : 45 - +
  • [36] A robust dimensionality reduction and matrix factorization framework for data clustering
    Li, Ruyue
    Zhang, Lefei
    Du, Bo
    PATTERN RECOGNITION LETTERS, 2019, 128 : 440 - 446
  • [37] A Folded Neural Network Autoencoder for Dimensionality Reduction
    Wang, Jing
    He, Haibo
    Prokhorov, Danil V.
    PROCEEDINGS OF THE INTERNATIONAL NEURAL NETWORK SOCIETY WINTER CONFERENCE (INNS-WC2012), 2012, 13 : 120 - 127
  • [38] Guided autoencoder for dimensionality reduction of pedestrian features
    Xuan Li
    Tao Zhang
    Xin Zhao
    Zhengming Yi
    Applied Intelligence, 2020, 50 : 4557 - 4567
  • [39] Dimensionality reduction of radio map with nonlinear autoencoder
    Lee, M. K.
    Han, D. S.
    ELECTRONICS LETTERS, 2012, 48 (11) : 655 - 657
  • [40] Guided autoencoder for dimensionality reduction of pedestrian features
    Li, Xuan
    Zhang, Tao
    Zhao, Xin
    Yi, Zhengming
    APPLIED INTELLIGENCE, 2020, 50 (12) : 4557 - 4567