Representation Learning for Clustering: A Statistical Framework

被引:0
|
作者
Ashtiani, Hassan [1 ]
Ben-David, Shai [1 ]
机构
[1] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which k-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings.
引用
收藏
页码:82 / 91
页数:10
相关论文
共 50 条
  • [41] Block diagonal representation learning for robust subspace clustering
    Wang, Lijuan
    Huang, Jiawen
    Yin, Ming
    Cai, Ruichu
    Hao, Zhifeng
    INFORMATION SCIENCES, 2020, 526 : 54 - 67
  • [42] Clustering Enhanced Multiplex Graph Contrastive Representation Learning
    Yuan, Ruiwen
    Tang, Yongqiang
    Wu, Yajing
    Zhang, Wensheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1341 - 1355
  • [43] Contrastive self-representation learning for data clustering
    Zhao, Wenhui
    Gao, Quanxue
    Mei, Shikun
    Yang, Ming
    NEURAL NETWORKS, 2023, 167 : 648 - 655
  • [44] Clustering-Based Relational Unsupervised Representation Learning with an Explicit Distributed Representation
    Dumancic, Sebastijan
    Blockeel, Hendrik
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1631 - 1637
  • [45] Support structure representation learning for sequential data clustering
    Wang, Xiumei
    Guo, Dingning
    Cheng, Peitao
    PATTERN RECOGNITION, 2022, 122
  • [46] Text clustering algorithm based on deep representation learning
    Wang, Binyu
    Liu, Wenfen
    Lin, Zijie
    Hu, Xuexian
    Wei, Jianghong
    Liu, Chun
    JOURNAL OF ENGINEERING-JOE, 2018, (16): : 1407 - 1414
  • [47] Deep graph clustering via aligning representation learning
    Chen, Zhikui
    Li, Lifang
    Zhang, Xu
    Wang, Han
    NEURAL NETWORKS, 2025, 183
  • [48] Inductive Document Representation Learning for Short Text Clustering
    Chen, Junyang
    Gong, Zhiguo
    Wang, Wei
    Dong, Xiao
    Liu, Weiwen
    Wang, Cong
    Chen, Xian
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III, 2021, 12459 : 600 - 616
  • [49] Towards Very Deep Representation Learning for Subspace Clustering
    Li, Yanming
    Wang, Shiye
    Li, Changsheng
    Yuan, Ye
    Wang, Guoren
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (07) : 3568 - 3579
  • [50] Joint Debiased Representation Learning and Imbalanced Data Clustering
    Rezaei, Mina
    Dorigatti, Emilio
    Ruegamer, David
    Bischl, Bernd
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 55 - 62