Scalable Spectral Clustering Using Random Binning Features

被引:28
|
作者
Wu, Lingfei [1 ]
Chen, Pin-Yu [1 ]
Yen, Ian En-Hsu [2 ]
Xu, Fangli [3 ]
Xia, Yinglong [4 ]
Aggarwal, Charu [1 ]
机构
[1] IBM Res AI, Armonk, NY 10504 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Coll William & Mary, Williamsburg, VA 23187 USA
[4] Huawei Res, Shenzhen, Peoples R China
关键词
Spectral clustering; Graph Construction; Random Binning Features; Eigendecomposition of Graph; PRIMME;
D O I
10.1145/3219819.3220090
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spectral clustering is one of the most effective clustering approaches that capture hidden cluster structures in the data. However, it does not scale well to large-scale problems due to its quadratic complexity in constructing similarity graphs and computing subsequent eigendecomposition. Although a number of methods have been proposed to accelerate spectral clustering, most of them compromise considerable information loss in the original data for reducing computational bottlenecks. In this paper, we present a novel scalable spectral clustering method using Random Binning features (RB) to simultaneously accelerate both similarity graph construction and the eigendecomposition. Specifically, we implicitly approximate the graph similarity (kernel) matrix by the inner product of a large sparse feature matrix generated by RB. Then we introduce a state-of-the-art SVD solver to effectively compute eigenvectors of this large matrix for spectral clustering. Using these two building blocks, we reduce the computational cost from quadratic to linear in the number of data points while achieving similar accuracy. Our theoretical analysis shows that spectral clustering via RB converges faster to the exact spectral clustering than the standard Random Feature approximation. Extensive experiments on 8 benchmarks show that the proposed method either outperforms or matches the state-of-the-art methods in both accuracy and runtime. Moreover, our method exhibits linear scalability in both the number of data samples and the number of RB features.
引用
收藏
页码:2506 / 2515
页数:10
相关论文
共 50 条
  • [11] Scalable model-based cluster analysis using clustering features
    Jin, HD
    Leung, KS
    Wong, ML
    Xu, ZB
    PATTERN RECOGNITION, 2005, 38 (05) : 637 - 649
  • [12] ACCELERATED SPECTRAL CLUSTERING USING GRAPH FILTERING OF RANDOM SIGNALS
    Tremblay, Nicolas
    Puy, Gilles
    Borgnat, Pierre
    Gribonval, Remi
    Vandergheynst, Pierre
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4094 - 4098
  • [13] Enabling scalable spectral clustering for image segmentation
    Tung, Frederick
    Wong, Alexander
    Clausi, David A.
    PATTERN RECOGNITION, 2010, 43 (12) : 4069 - 4076
  • [14] A scalable approach to spectral clustering with SDD solvers
    Nguyen Lu Dang Khoa
    Chawla, Sanjay
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2015, 44 (02) : 289 - 308
  • [15] A scalable approach to spectral clustering with SDD solvers
    Nguyen Lu Dang Khoa
    Sanjay Chawla
    Journal of Intelligent Information Systems, 2015, 44 : 289 - 308
  • [16] Scalable density-based clustering with quality guarantees using random projections
    Johannes Schneider
    Michail Vlachos
    Data Mining and Knowledge Discovery, 2017, 31 : 972 - 1005
  • [17] Scalable density-based clustering with quality guarantees using random projections
    Schneider, Johannes
    Vlachos, Michail
    DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (04) : 972 - 1005
  • [18] Segmentation of Heart Sound by Clustering Using Spectral and Temporal Features
    Khalid, Shah
    Hassan, Ali
    Ullah, Sana
    Riaz, Farhan
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 337 - 346
  • [19] Crack Detection using Spectral Clustering Based on Crack Features
    Matsuoka, Takumi
    Matsushima, Kousuke
    PROCEEDINGS OF THE 2016 IEEE REGION 10 CONFERENCE (TENCON), 2016, : 2575 - 2578
  • [20] BBCA: Improving the scalability of *BEAST using random binning
    Théo Zimmermann
    Siavash Mirarab
    Tandy Warnow
    BMC Genomics, 15