A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data

被引：121

作者：

Peng, Xi ^{[1
]}

Tang, Huajin ^{[2
]}

Zhang, Lei ^{[2
]}

Yi, Zhang ^{[2
]}

Xiao, Shijie ^{[3
]}

机构：

[1] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore 138632, Singapore

[2] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[3] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2016年 / 27卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Error bound analysis; least square regression (LSR); low-rank representation (LRR); out-of-sample problem; scalable subspace clustering; sparse subspace clustering (SSC); SPARSE REPRESENTATION; COLLABORATIVE REPRESENTATION; RANK REPRESENTATION; FACE RECOGNITION; SPECTRAL METHODS; SEGMENTATION; KERNEL;

D O I：

10.1109/TNNLS.2015.2490080

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph, which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and l(2)-norm-based representation, and have achieved the state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving the large-scale problems. Second, they cannot cope with the out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework that makes the representation-based subspace clustering algorithms feasible to cluster both the out-of-sample and the large-scale data. Under our framework, the large-scale problem is tackled by converting it as the out-of-sample problem in the manner of sampling, clustering, coding, and classifying. Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently proposed scalable methods in clustering a large-scale data set.

引用

页码：2499 / 2512

页数：14

共 50 条

[41] Large-Scale Multi-View Subspace Clustering in Linear Time
Kang, Zhao
Zhou, Wangtao
Zhao, Zhitong
Shao, Junming
Han, Meng
Xu, Zenglin
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4412 - 4419
[42] An out-of-sample framework for TOPSIS-based classifiers with application in bankruptcy prediction
Ouenniche, Jamal
Perez-Gladish, Blanca
Bouslah, Kais
TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2018, 131 : 111 - 116
[43] CNN-Based Joint Clustering and Representation Learning with Feature Drift Compensation for Large-Scale Image Data
Hsu, Chih-Chung
Lin, Chia-Wen
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (02) : 421 - 429
[44] Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation
Gallego, Antonio-Javier
Calvo-Zaragoza, Jorge
Valero-Mas, Jose J.
Rico-Juan, Juan R.
PATTERN RECOGNITION, 2018, 74 : 531 - 543
[45] Parallel gravitational clustering based on grid partitioning for large-scale data
Lei Chen
Fadong Chen
Zhaohua Liu
Mingyang Lv
Tingqin He
Shiwen Zhang
Applied Intelligence, 2023, 53 : 2506 - 2526
[46] Parallel gravitational clustering based on grid partitioning for large-scale data
Chen, Lei
Chen, Fadong
Liu, Zhaohua
Lv, Mingyang
He, Tingqin
Zhang, Shiwen
APPLIED INTELLIGENCE, 2023, 53 (03) : 2506 - 2526
[47] Fuzzy clustering algorithm based on multiple medoids for large-scale data
Chen A.-G.
Wang S.-T.
Kongzhi yu Juece/Control and Decision, 2016, 31 (12): : 2122 - 2130
[48] CLUSTERING LARGE-SCALE DATA BASED ON MODIFIED AFFINITY PROPAGATION ALGORITHM
Serdah, Ahmed M.
Ashour, Wesam M.
JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2016, 6 (01) : 23 - 33
[49] A Fast Semi-Supervised Clustering Framework for Large-Scale Time Series Data
He, Guoliang
Pan, Yanzhou
Xia, Xuewen
He, Jinrong
Peng, Rong
Xiong, Neal N.
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (07): : 4201 - 4216
[50] Systematic Topology Design for Large-Scale Networks: A Unified Framework
Chang, Yijia
Huang, Xi
Deng, Longxiulin
Shao, Ziyu
Zhang, Junshan
IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2020, : 347 - 356

← 1 2 3 4 5 →