A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data

被引:121
|
作者
Peng, Xi [1 ]
Tang, Huajin [2 ]
Zhang, Lei [2 ]
Yi, Zhang [2 ]
Xiao, Shijie [3 ]
机构
[1] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore 138632, Singapore
[2] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[3] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
Error bound analysis; least square regression (LSR); low-rank representation (LRR); out-of-sample problem; scalable subspace clustering; sparse subspace clustering (SSC); SPARSE REPRESENTATION; COLLABORATIVE REPRESENTATION; RANK REPRESENTATION; FACE RECOGNITION; SPECTRAL METHODS; SEGMENTATION; KERNEL;
D O I
10.1109/TNNLS.2015.2490080
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph, which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and l(2)-norm-based representation, and have achieved the state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving the large-scale problems. Second, they cannot cope with the out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework that makes the representation-based subspace clustering algorithms feasible to cluster both the out-of-sample and the large-scale data. Under our framework, the large-scale problem is tackled by converting it as the out-of-sample problem in the manner of sampling, clustering, coding, and classifying. Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently proposed scalable methods in clustering a large-scale data set.
引用
收藏
页码:2499 / 2512
页数:14
相关论文
共 50 条
  • [41] Large-Scale Multi-View Subspace Clustering in Linear Time
    Kang, Zhao
    Zhou, Wangtao
    Zhao, Zhitong
    Shao, Junming
    Han, Meng
    Xu, Zenglin
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4412 - 4419
  • [42] An out-of-sample framework for TOPSIS-based classifiers with application in bankruptcy prediction
    Ouenniche, Jamal
    Perez-Gladish, Blanca
    Bouslah, Kais
    TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2018, 131 : 111 - 116
  • [43] CNN-Based Joint Clustering and Representation Learning with Feature Drift Compensation for Large-Scale Image Data
    Hsu, Chih-Chung
    Lin, Chia-Wen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (02) : 421 - 429
  • [44] Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation
    Gallego, Antonio-Javier
    Calvo-Zaragoza, Jorge
    Valero-Mas, Jose J.
    Rico-Juan, Juan R.
    PATTERN RECOGNITION, 2018, 74 : 531 - 543
  • [45] Parallel gravitational clustering based on grid partitioning for large-scale data
    Lei Chen
    Fadong Chen
    Zhaohua Liu
    Mingyang Lv
    Tingqin He
    Shiwen Zhang
    Applied Intelligence, 2023, 53 : 2506 - 2526
  • [46] Parallel gravitational clustering based on grid partitioning for large-scale data
    Chen, Lei
    Chen, Fadong
    Liu, Zhaohua
    Lv, Mingyang
    He, Tingqin
    Zhang, Shiwen
    APPLIED INTELLIGENCE, 2023, 53 (03) : 2506 - 2526
  • [47] Fuzzy clustering algorithm based on multiple medoids for large-scale data
    Chen A.-G.
    Wang S.-T.
    Kongzhi yu Juece/Control and Decision, 2016, 31 (12): : 2122 - 2130
  • [48] CLUSTERING LARGE-SCALE DATA BASED ON MODIFIED AFFINITY PROPAGATION ALGORITHM
    Serdah, Ahmed M.
    Ashour, Wesam M.
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2016, 6 (01) : 23 - 33
  • [49] A Fast Semi-Supervised Clustering Framework for Large-Scale Time Series Data
    He, Guoliang
    Pan, Yanzhou
    Xia, Xuewen
    He, Jinrong
    Peng, Rong
    Xiong, Neal N.
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (07): : 4201 - 4216
  • [50] Systematic Topology Design for Large-Scale Networks: A Unified Framework
    Chang, Yijia
    Huang, Xi
    Deng, Longxiulin
    Shao, Ziyu
    Zhang, Junshan
    IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2020, : 347 - 356