Scalable Spectral Clustering Using Random Binning Features

被引:28
|
作者
Wu, Lingfei [1 ]
Chen, Pin-Yu [1 ]
Yen, Ian En-Hsu [2 ]
Xu, Fangli [3 ]
Xia, Yinglong [4 ]
Aggarwal, Charu [1 ]
机构
[1] IBM Res AI, Armonk, NY 10504 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Coll William & Mary, Williamsburg, VA 23187 USA
[4] Huawei Res, Shenzhen, Peoples R China
关键词
Spectral clustering; Graph Construction; Random Binning Features; Eigendecomposition of Graph; PRIMME;
D O I
10.1145/3219819.3220090
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spectral clustering is one of the most effective clustering approaches that capture hidden cluster structures in the data. However, it does not scale well to large-scale problems due to its quadratic complexity in constructing similarity graphs and computing subsequent eigendecomposition. Although a number of methods have been proposed to accelerate spectral clustering, most of them compromise considerable information loss in the original data for reducing computational bottlenecks. In this paper, we present a novel scalable spectral clustering method using Random Binning features (RB) to simultaneously accelerate both similarity graph construction and the eigendecomposition. Specifically, we implicitly approximate the graph similarity (kernel) matrix by the inner product of a large sparse feature matrix generated by RB. Then we introduce a state-of-the-art SVD solver to effectively compute eigenvectors of this large matrix for spectral clustering. Using these two building blocks, we reduce the computational cost from quadratic to linear in the number of data points while achieving similar accuracy. Our theoretical analysis shows that spectral clustering via RB converges faster to the exact spectral clustering than the standard Random Feature approximation. Extensive experiments on 8 benchmarks show that the proposed method either outperforms or matches the state-of-the-art methods in both accuracy and runtime. Moreover, our method exhibits linear scalability in both the number of data samples and the number of RB features.
引用
收藏
页码:2506 / 2515
页数:10
相关论文
共 50 条
  • [21] Phishing Detection Using Traffic Behavior, Spectral Clustering, and Random Forests
    DeBarr, Dave
    Ramanathan, Venkatesh
    Wechsler, Harry
    2013 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS: BIG DATA, EMERGENT THREATS, AND DECISION-MAKING IN SECURITY INFORMATICS, 2013, : 67 - 72
  • [22] BBCA: Improving the scalability of *BEAST using random binning
    Zimmermann, Theo
    Mirarab, Siavash
    Warnow, Tandy
    BMC GENOMICS, 2014, 15
  • [23] Simple and Scalable Constrained Clustering: A Generalized Spectral Method
    Cucuringu, Mihai
    Koutis, Ioannis
    Chawla, Sanjay
    Miller, Gary
    Peng, Richard
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 445 - 454
  • [24] Random Gegenbauer Features for Scalable Kernel Methods
    Han, Insu
    Zandieh, Amir
    Avron, Haim
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [25] Object recognition by clustering spectral features
    Luo, B
    Wilson, RC
    Hancock, ER
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 429 - 432
  • [26] DiSC: Differential Spectral Clustering of Features
    Sristi, Ram Dyuthi
    Mishne, Gal
    Jaffe, Ariel
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [27] Heart Sounds Clustering using a Combination of Temporal, Spectral and Geometric Features
    Safara, Fatemeh
    Doraisamy, Shyamala
    Azman, Azreen
    Jantan, Azrul
    2012 COMPUTING IN CARDIOLOGY (CINC), VOL 39, 2012, 39 : 217 - 220
  • [28] Dynamic Affinity Graph Construction for Spectral Clustering Using Multiple Features
    Li, Zhihui
    Nie, Feiping
    Chang, Xiaojun
    Yang, Yi
    Zhang, Chengqi
    Sebe, Nicu
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (12) : 6323 - 6332
  • [29] Binning Metagenomic Contigs Using Unsupervised Clustering and Reference Databases
    Zhongjun Jiang
    Xiaobo Li
    Lijun Guo
    Interdisciplinary Sciences: Computational Life Sciences, 2022, 14 : 795 - 803
  • [30] Binning Metagenomic Contigs Using Unsupervised Clustering and Reference Databases
    Jiang, Zhongjun
    Li, Xiaobo
    Guo, Lijun
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2022, 14 (04) : 795 - 803