Understanding the Generalization Performance of Spectral Clustering Algorithms

被引:0
|
作者
Li, Shaojie
Ouyang, Sheng
Liu, Yong [1 ]
机构
[1] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
CONSISTENCY; CONVERGENCE; CUTS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The theoretical analysis of spectral clustering is mainly devoted to consistency, while there is little research on its generalization performance. In this paper, we study the excess risk bounds of the popular spectral clustering algorithms: relaxed RatioCut and relaxed NCut. Our analysis follows the two practical steps of spectral clustering algorithms: continuous solution and discrete solution. Firstly, we provide the convergence rate of the excess risk bounds between the empirical continuous optimal solution and the population-level continuous optimal solution. Secondly, we show the fundamental quantity influencing the excess risk between the empirical discrete optimal solution and the population-level discrete optimal solution. At the empirical level, algorithms can be designed to reduce this quantity. Based on our theoretical analysis, we propose two novel algorithms that can penalize this quantity and, additionally, can cluster the out-of-sample data without re-eigendecomposition on the overall samples. Numerical experiments on toy and real datasets confirm the effectiveness of our proposed algorithms.
引用
收藏
页码:8614 / 8621
页数:8
相关论文
共 50 条
  • [31] Kernel-based clustering algorithms for spectral pattern recognition
    Hung, Chih-Cheng
    Zhou, Jian
    Petchokomani, Zacharie
    Coleman, Tommy
    PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON INFORMATION AND MANAGEMENT SCIENCES, 2007, 6 : 380 - 384
  • [32] On data and algorithms: Understanding inductive performance
    Kalousis, A
    Gama, J
    Hilario, M
    MACHINE LEARNING, 2004, 54 (03) : 275 - 312
  • [33] On Data and Algorithms: Understanding Inductive Performance
    Alexandros Kalousis
    João Gama
    Melanie Hilario
    Machine Learning, 2004, 54 : 275 - 312
  • [34] Performance Analysis of Clustering Algorithms in Medical Datasets
    Premalatha, P.
    Subasree, S.
    PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [35] Performance Comparison of Clustering Algorithms on Scientific Publications
    Parlina, Anne
    Ramli, Kalamullah
    ADVANCED SCIENCE LETTERS, 2017, 23 (04) : 3730 - 3732
  • [36] Analysing Clustering Algorithms Performance in CRM Systems
    Enesi, Indrit
    Lico, Ledion
    Biberaj, Aleksander
    Shahu, Desar
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 803 - 809
  • [37] Performance Evaluation of Features and Clustering Algorithms for Malware
    Faridi, Houtan
    Srinivasagopalan, Srivathsan
    Verma, Rakesh
    2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 13 - 22
  • [38] A Statistical Performance Analysis of Graph Clustering Algorithms
    Miasnikof, Pierre
    Shestopaloff, Alexander Y.
    Bonner, Anthony J.
    Lawryshyn, Yuri
    ALGORITHMS AND MODELS FOR THE WEB GRAPH (WAW 2018), 2018, 10836 : 170 - 184
  • [39] Towards Understanding Clustering Problems and Algorithms: An Instance Space Analysis
    Fernandes, Luiz Henrique dos Santos
    Lorena, Ana Carolina
    Smith-Miles, Kate
    ALGORITHMS, 2021, 14 (03)
  • [40] Evaluation and Improvement of Generalization Performance of SAR Ship Recognition Algorithms
    Zhang, Chi
    Zhang, Xi
    Zhang, Jie
    Gao, Gui
    Dai, Yongshou
    Liu, Genwang
    Jia, Yongjun
    Wang, Xiaochen
    Zhang, Yi
    Bao, Meng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 9311 - 9326