Representation Learning for Clustering: A Statistical Framework

被引:0
|
作者
Ashtiani, Hassan [1 ]
Ben-David, Shai [1 ]
机构
[1] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which k-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings.
引用
收藏
页码:82 / 91
页数:10
相关论文
共 50 条
  • [1] A Deep Spatiotemporal Trajectory Representation Learning Framework for Clustering
    Wang, Chao
    Huang, Jiahui
    Wang, Yongheng
    Lin, Zhengxuan
    Jin, Xiongnan
    Jin, Xing
    Weng, Di
    Wu, Yingcai
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (07) : 7687 - 7700
  • [2] Joint Representation Learning and Clustering: A Framework for Grouping Partial Multiview Data
    Zhuge, Wenzhang
    Tao, Hong
    Luo, Tingjin
    Zeng, Ling-Li
    Hou, Chenping
    Yi, Dongyun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (08) : 3826 - 3840
  • [3] A structural consensus representation learning framework for multi-view clustering
    Bai, Ruina
    Huang, Ruizhang
    Qin, Yongbin
    Chen, Yanping
    Xu, Yong
    KNOWLEDGE-BASED SYSTEMS, 2024, 283
  • [4] Weighted sparse simplex representation: a unified framework for subspace clustering, constrained clustering, and active learning
    Hankui Peng
    Nicos G. Pavlidis
    Data Mining and Knowledge Discovery, 2022, 36 : 958 - 986
  • [5] Weighted sparse simplex representation: a unified framework for subspace clustering, constrained clustering, and active learning
    Peng, Hankui
    Pavlidis, Nicos G.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 36 (03) : 958 - 986
  • [6] A statistical framework for natural feature representation
    Kumar, S
    Ramos, F
    Upcroft, B
    Durrant-Whyte, H
    2005 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4, 2005, : 1 - 6
  • [7] Representation Learning: A Statistical Perspective
    Xie, Jianwen
    Gao, Ruiqi
    Nijkamp, Erik
    Zhu, Song-Chun
    Wu, Ying Nian
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 7, 2020, 2020, 7 : 303 - 335
  • [8] A categorical data clustering framework on graph representation
    Bai, Liang
    Liang, Jiye
    PATTERN RECOGNITION, 2022, 128
  • [9] Learning Idempotent Representation for Subspace Clustering
    Wei, Lai
    Liu, Shiteng
    Zhou, Rigui
    Zhu, Changming
    Liu, Jin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (03) : 1183 - 1197
  • [10] Learning deep representation for trajectory clustering
    Yao, Di
    Zhang, Chao
    Zhu, Zhihua
    Hu, Qin
    Wang, Zheng
    Huang, Jianhui
    Bi, Jingping
    EXPERT SYSTEMS, 2018, 35 (02)