Representation Learning for Clustering: A Statistical Framework

被引:0
|
作者
Ashtiani, Hassan [1 ]
Ben-David, Shai [1 ]
机构
[1] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which k-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings.
引用
收藏
页码:82 / 91
页数:10
相关论文
共 50 条
  • [31] Optimal Interval Clustering: Application to Bregman Clustering and Statistical Mixture Learning
    Nielsen, Frank
    Nock, Richard
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (10) : 1289 - 1292
  • [32] Representation Learning with Statistical Independence to Mitigate Bias
    Adeli, Ehsan
    Zhao, Qingyu
    Pfefferbaum, Adolf
    Sullivan, Edith, V
    Li Fei-Fei
    Niebles, Juan Carlos
    Pohl, Kilian M.
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2512 - 2522
  • [33] Statistical Machine Learning: A Unified Framework
    Liu, Shuangzhe
    INTERNATIONAL STATISTICAL REVIEW, 2021,
  • [34] Statistical Machine Learning: A Unified Framework
    Liu, Shuangzhe
    INTERNATIONAL STATISTICAL REVIEW, 2021, 89 (01) : 210 - 212
  • [35] Statistical Machine Learning - A Unified Framework
    Liu, Xiao
    JOURNAL OF QUALITY TECHNOLOGY, 2022, 54 (05) : 605 - 605
  • [36] A Scalable Framework for Data-Driven Subspace Representation and Clustering
    Kim, Eunwoo
    Lee, Minsik
    Oh, Songhwai
    PATTERN RECOGNITION LETTERS, 2019, 125 : 742 - 749
  • [37] ClusterX: a novel representation learning-based deep clustering framework for accurate visual inspection in virtual screening
    Chen, Sikang
    Gao, Jian
    Chen, Jiexuan
    Xie, Yufeng
    Shen, Zheyuan
    Xu, Lei
    Che, Jinxin
    Wu, Jian
    Dong, Xiaowu
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (03)
  • [38] A framework for statistical clustering with a constant time approximation algorithms for K-median clustering
    Ben-David, S
    LEARNING THEORY, PROCEEDINGS, 2004, 3120 : 415 - 426
  • [39] Similarity Preserving Representation Learning for Time Series Clustering
    Lei, Qi
    Yi, Jinfeng
    Vaculin, Roman
    Wu, Lingfei
    Dhillon, Inderjit S.
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2845 - 2851
  • [40] Discovering Symptom Subgroups via Representation Learning and Clustering
    Zhang, Ying
    Shi, Hongbo
    Ji, Suqin
    2021 3RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING, BIG DATA AND BUSINESS INTELLIGENCE (MLBDBI 2021), 2021, : 100 - 103