Scalable spectral clustering with cosine similarity

被引:0
|
作者
Chen, Guangliang [1 ]
机构
[1] San Jose State Univ, Dept Math & Stat, San Jose, CA 95192 USA
关键词
DATA SETS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a unified scalable computing framework for three versions of spectral clustering - Normalized Cut (Shi and Malik, 2000), the Ng-Jordan-Weiss (NJW) algorithm (2001), and Diffusion Maps (Coifman and Lafon, 2006), in the setting of cosine similarity. We assume that the input data is either sparse (e.g., as a document-term frequency matrix) or of only a few hundred dimensions (e.g., for small images or data obtained through PCA). We show that in such cases, spectral clustering can be implemented solely based on efficient operations on the data matrix such as elementwise manipulation, matrix-vector multiplication and low-rank SVD, thus entirely avoiding the weight matrix. Our algorithm is simple to implement, fast to run, accurate and robust to outliers. We demonstrate its superior performance through extensive experiments which compare our scalable algorithm with the plain implementation on several benchmark data sets.
引用
收藏
页码:314 / 319
页数:6
相关论文
共 50 条
  • [1] A Scalable Spectral Clustering Algorithm Based on Landmark-Embedding and Cosine Similarity
    Chen, Guangliang
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2018, 2018, 11004 : 52 - 62
  • [2] CSMR: A scalable algorithm for text clustering with cosine similarity and MapReduce
    Victor, Giannakouris-Salalidis
    Antonia, Plerou
    Spyros, Sioutas
    IFIP Advances in Information and Communication Technology, 2014, 437 : 211 - 220
  • [3] A fast incremental spectral clustering algorithm with cosine similarity
    Li, Ran
    Chen, Guangliang
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 80 - 88
  • [4] Fast, Memory-Efficient Spectral Clustering with Cosine Similarity
    Li, Ran
    Chen, Guangliang
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I, 2024, 14469 : 700 - 714
  • [5] Fast spectral clustering with local cosine similarity graphs for hyperspectral images
    Lin, Zhenxian
    Jiang, Yuheng
    Wu, Chengmao
    JOURNAL OF APPLIED REMOTE SENSING, 2024, 18 (02)
  • [6] Scalable Sequential Spectral Clustering
    Li, Yeqing
    Huang, Junzhou
    Liu, Wei
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1809 - 1815
  • [7] Scalable Constrained Spectral Clustering
    Li, Jianyuan
    Xia, Yingjie
    Shan, Zhenyu
    Liu, Yuncai
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (02) : 589 - 593
  • [8] Incomplete multi-view clustering with cosine similarity
    Yin, Jun
    Sun, Shiliang
    PATTERN RECOGNITION, 2022, 123
  • [9] Hierarchical Document Clustering based on Cosine Similarity measure
    Popat, Shraddha K.
    Deshmukh, Pramod B.
    Metre, Vishakha A.
    2017 1ST INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND INFORMATION MANAGEMENT (ICISIM), 2017, : 153 - 159
  • [10] Ultra-Scalable Spectral Clustering and Ensemble Clustering
    Huang, Dong
    Wang, Chang-Dong
    Wu, Jian-Sheng
    Lai, Jian-Huang
    Kwoh, Chee-Keong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (06) : 1212 - 1226