InfoNCE Loss Provably Learns Cluster-Preserving Representations

被引:0
|
作者
Parulekar, Advait [1 ]
Collins, Liam [1 ]
Shanmugam, Karthikeyan [2 ]
Mokhtari, Aryan [1 ]
Shakkottai, Sanjay [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Google Res India, Mumbai, Maharashtra, India
关键词
Contrastive learning; Representation learning; Self-supervised learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of contrasting learning is to learn a representation that preserves underlying clusters by keeping samples with similar content, e.g. the "dogness" of a dog, close to each other in the space generated by the representation. A common and successful approach for tackling this unsupervised learning problem is minimizing the InfoNCE loss associated with the training samples, where each sample is associated with their augmentations (positive samples such as rotation, crop) and a batch of negative samples (unrelated samples). To the best of our knowledge, it was unanswered if the representation learned by minimizing the InfoNCE loss preserves the underlying data clusters, as it only promotes learning a representation that is faithful to augmentations, i.e., an image and its augmentations have the same representation. Our main result is to show that the representation learned by InfoNCE with a finite number of negative samples is also consistent with respect to clusters in the data, under the condition that the augmentation sets within clusters may be non-overlapping but are close and intertwined, relative to the complexity of the learning function class.
引用
收藏
页数:48
相关论文
共 11 条
  • [1] Heavy Hitters via Cluster-Preserving Clustering
    Larsen, Kasper Green
    Nelson, Jelani
    Huy L Nguyen
    Thorup, Mikkel
    COMMUNICATIONS OF THE ACM, 2019, 62 (08) : 95 - 100
  • [2] Heavy hitters via cluster-preserving clustering
    Larsen, Kasper Green
    Nelson, Jelani
    Nguyen, Huy L.
    Thorup, Mikkel
    2016 IEEE 57TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2016, : 61 - 70
  • [3] Cluster-preserving dimension reduction methods for document classification
    Howland, Peg
    Park, Haesun
    SURVEY OF TEXT MINING II: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2008, : 3 - +
  • [4] Contrastive Learning with Cluster-Preserving Augmentation for Attributed Graph Clustering
    Zheng, Yimei
    Jia, Caiyan
    Yu, Jian
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT I, 2023, 14169 : 644 - 661
  • [5] Cluster-preserving sampling algorithm for large-scale graphs
    Jianpeng ZHANG
    Hongchang CHEN
    Dingjiu YU
    Yulong PEI
    Yingjun DENG
    Science China(Information Sciences), 2023, 66 (01) : 60 - 76
  • [6] Cluster-preserving sampling algorithm for large-scale graphs
    Jianpeng Zhang
    Hongchang Chen
    Dingjiu Yu
    Yulong Pei
    Yingjun Deng
    Science China Information Sciences, 2023, 66
  • [7] Cluster-preserving sampling algorithm for large-scale graphs
    Zhang, Jianpeng
    Chen, Hongchang
    Yu, Dingjiu
    Pei, Yulong
    Deng, Yingjun
    SCIENCE CHINA-INFORMATION SCIENCES, 2023, 66 (01)
  • [8] Cluster-preserving sampling from fully-dynamic streaming graphs
    Zhang, Jianpeng
    Zhu, Kaijie
    Pei, Yulong
    Fletcher, George
    Pechenizkiy, Mykola
    INFORMATION SCIENCES, 2019, 482 : 279 - 300
  • [9] Attributed graph clustering under the contrastive mechanism with cluster-preserving augmentation
    Zheng, Yimei
    Jia, Caiyan
    Yu, Jian
    INFORMATION SCIENCES, 2024, 681
  • [10] Cluster-preserving dimension reduction methods for efficient classification of text data
    Howland, P
    Park, H
    SURVEY OF TEXT MINING: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2004, : 3 - 23