Representation Learning for Clustering: A Statistical Framework

被引:0
|
作者
Ashtiani, Hassan [1 ]
Ben-David, Shai [1 ]
机构
[1] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which k-means clustering results in a clustering (of the full data set) that is aligned with the user's clustering. We provide a formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm. We then introduce a notion of capacity of a class of possible representations, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds, and end our discussion with an analysis of that dimension for classes of representations induced by linear embeddings.
引用
收藏
页码:82 / 91
页数:10
相关论文
共 50 条
  • [21] Representation learning for clustering via building consensus
    Aniket Anand Deshmukh
    Jayanth Reddy Regatti
    Eren Manavoglu
    Urun Dogan
    Machine Learning, 2022, 111 : 4601 - 4638
  • [22] Online Deep Clustering for Unsupervised Representation Learning
    Zhan, Xiaohang
    Xie, Jiahao
    Liu, Ziwei
    Ong, Yew-Soon
    Loy, Chen Change
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6687 - 6696
  • [23] Trajectory Clustering via Deep Representation Learning
    Yao, Di
    Zhang, Chao
    Zhu, Zhihua
    Huang, Jianhui
    Bi, Jingping
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3880 - 3887
  • [24] A simultaneous learning framework for clustering and classification
    Cai, Weiling
    Chen, Songcan
    Zhang, Daoqiang
    PATTERN RECOGNITION, 2009, 42 (07) : 1248 - 1259
  • [25] Statistical shape representation with landmark clustering by solving the assignment problem
    Ibragimov, Bulat
    Likar, Bostjan
    Pernus, Franjo
    Vrtovec, Tomaz
    MEDICAL IMAGING 2013: IMAGE PROCESSING, 2013, 8669
  • [26] Representation Learning Based on Autoencoder and Deep Adaptive Clustering for Image Clustering
    Yu, Siquan
    Liu, Jiaxin
    Han, Zhi
    Li, Yong
    Tang, Yandong
    Wu, Chengdong
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [27] A Structural Graph Representation Learning Framework
    Rossi, Ryan A.
    Ahmed, Nesreen K.
    Koh, Eunyee
    Kim, Sungchul
    Rao, Anup
    Abbasi-Yadkori, Yasin
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 483 - 491
  • [28] A Representation Learning Framework for Property Graphs
    Hou, Yifan
    Chen, Hongzhi
    Li, Changji
    Cheng, James
    Yang, Ming-Chang
    KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 65 - 73
  • [29] Contrastive Representation Learning: A Framework and Review
    Le-Khac, Phuc H.
    Healy, Graham
    Smeaton, Alan F.
    IEEE ACCESS, 2020, 8 : 193907 - 193934
  • [30] Statistical shape analysis: Clustering, learning, and testing
    Srivastava, A
    Joshi, SH
    Mio, W
    Liu, XW
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (04) : 590 - 602