Finding multiple stable clusterings

被引:0
|
作者
Juhua Hu
Qi Qian
Jian Pei
Rong Jin
Shenghuo Zhu
机构
[1] Simon Fraser University,School of Computing Science
[2] Alibaba Group,undefined
来源
关键词
Multi-clustering; Clustering stability; Laplacian eigengap; Feature subspace;
D O I
暂无
中图分类号
学科分类号
摘要
Multi-clustering, which tries to find multiple independent ways to partition a data set into groups, has enjoyed many applications, such as customer relationship management, bioinformatics and healthcare informatics. This paper addresses two fundamental questions in multi-clustering: How to model quality of clusterings and how to find multiple stable clusterings (MSC). We introduce to multi-clustering the notion of clustering stability based on Laplacian eigengap, which was originally used by the regularized spectral learning method for similarity matrix learning. We mathematically prove that the larger the eigengap, the more stable the clustering. Furthermore, we propose a novel multi-clustering method MSC. An advantage of our method comparing to the state-of-the-art multi-clustering methods is that our method can provide users a feature subspace to understand each clustering solution. Another advantage is that MSC does not need users to specify the number of clusters and the number of alternative clusterings, which is usually difficult for users without any guidance. Our method can heuristically estimate the number of stable clusterings in a data set. We also discuss a practical way to make MSC applicable to large-scale data. We report an extensive empirical study that clearly demonstrates the effectiveness of our method.
引用
收藏
页码:991 / 1021
页数:30
相关论文
共 50 条
  • [41] On constructing an optimal consensus clustering from multiple clusterings
    Berman, Piotr
    DasGupta, Bhaskar
    Kao, Ming-Yang
    Wang, Jie
    INFORMATION PROCESSING LETTERS, 2007, 104 (04) : 137 - 145
  • [42] Visualizing transactional data with multiple clusterings for knowledge discovery
    Durand, Nicolas
    Cremilleux, Bruno
    Suzuki, Einoshin
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2006, 4203 : 47 - 57
  • [43] Generating multiple alternative clusterings via globally optimal subspaces
    Dang, Xuan Hong
    Bailey, James
    DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (03) : 569 - 592
  • [44] MultiClust special issue on discovering, summarizing and using multiple clusterings
    Mueller, Emmanuel
    Assent, Ira
    Guennemann, Stephan
    Seidl, Thomas
    Dy, Jennifer
    MACHINE LEARNING, 2015, 98 (1-2) : 1 - 5
  • [45] Combining multiple clusterings via k-modes algorithm
    Luo, Huilan
    Kong, Fansheng
    Li, Yixiao
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2006, 4093 : 308 - 315
  • [46] Generating multiple alternative clusterings via globally optimal subspaces
    Xuan Hong Dang
    James Bailey
    Data Mining and Knowledge Discovery, 2014, 28 : 569 - 592
  • [47] On Regularizing Multiple Clusterings for Ensemble Clustering by Graph Tensor Learning
    Chen, Man-Sheng
    Lin, Jia-Qi
    Wang, Chang-Dong
    Xi, Wu-Dong
    Huang, Dong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3069 - 3077
  • [48] MultiClust special issue on discovering, summarizing and using multiple clusterings
    Emmanuel Müller
    Ira Assent
    Stephan Günnemann
    Thomas Seidl
    Jennifer Dy
    Machine Learning, 2015, 98 : 1 - 5
  • [49] Using Soft Consensus Clustering for Combining Multiple Clusterings of Chemical Structures
    Saeed, Faisal
    Salim, Naomie
    JURNAL TEKNOLOGI, 2013, 63 (01):
  • [50] Combining multiple clusterings using information theory based genetic algorithm
    Luo, Huilan
    Jing, Furong
    Xie, Xiaobing
    2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 84 - 89