Denoising Autoencoder as an Effective Dimensionality Reduction and Clustering of Text Data

被引:12
|
作者
Leyli-Abadi, Milad [1 ]
Labiod, Lazhar [1 ]
Nadif, Mohamed [1 ]
机构
[1] Paris Descartes Univ, LIPADE, F-75006 Paris, France
关键词
Auto-encoder; Deep learning; Cosine similarity; Neighborhood; Document clustering; Unsupervised learning; Dimensionality reduction; FRAMEWORK;
D O I
10.1007/978-3-319-57529-2_62
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning methods are widely used in vision and face recognition, however there is a real lack of application of such methods in the field of text data. In this context, the data is often represented by a sparse high dimensional document-term matrix. Dealing with such data matrices, we present, in this paper, a new denoising auto-encoder for dimensionality reduction, where each document is not only affected by its own information, but also affected by the information from its neighbors according to the cosine similarity measure. It turns out that the proposed auto-encoder can discover the low dimensional embeddings, and as a result reveal the underlying effective manifold structure. The visual representation of these embeddings suggests the suitability of performing the clustering on the set of documents relying on the Expectation-Maximization algorithm for Gaussian mixture models. On real-world datasets, the relevance of the presented auto-encoder in the visualisation and document clustering field is shown by a comparison with five widely used unsupervised dimensionality reduction methods including the classic auto-encoder.
引用
收藏
页码:801 / 813
页数:13
相关论文
共 50 条
  • [41] Consensus Clustering for Dimensionality Reduction
    Rani, D. Sandhya
    Rani, T. Sobha
    Bhavani, S. Durga
    2014 SEVENTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2014, : 148 - 153
  • [42] Nonlinear dimensionality reduction for clustering
    Tasoulis, Sotiris
    Pavlidis, Nicos G.
    Roos, Teemu
    PATTERN RECOGNITION, 2020, 107 (107)
  • [43] Reduction of dimensionality for perceptual clustering
    Benítez, C
    Lander, DK
    Ramirez, J
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL V, 2000, : 148 - 151
  • [44] Effective and Efficient Spectral Clustering on Text and Link Data
    Xu, Zhiqiang
    Ke, Yiping
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 357 - 366
  • [45] Taxonomy grooming algorithm - An autodidactic domain specific dimensionality reduction approach for fast clustering of social media text data
    Renjith, Shini
    Sreekumar, A.
    Jathavedan, M.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (11):
  • [46] Deep text clustering using stacked AutoEncoder
    Soodeh Hosseini
    Zahra Asghari Varzaneh
    Multimedia Tools and Applications, 2022, 81 : 10861 - 10881
  • [47] Deep text clustering using stacked AutoEncoder
    Hosseini, Soodeh
    Varzaneh, Zahra Asghari
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (08) : 10861 - 10881
  • [48] Parallel rare term vector replacement: Fast and effective dimensionality reduction for text
    Berka, T.
    Vajtersic, M.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (03) : 341 - 351
  • [49] An effective dimensionality reduction method for text classification based on TFP-tree
    Liu, Lu
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (03) : 1893 - 1905
  • [50] SCDRHA: A scRNA-Seq Data Dimensionality Reduction Algorithm Based on Hierarchical Autoencoder
    Zhao, Jianping
    Wang, Na
    Wang, Haiyun
    Zheng, Chunhou
    Su, Yansen
    FRONTIERS IN GENETICS, 2021, 12