InfoNCE Loss Provably Learns Cluster-Preserving Representations

被引：0

作者：

Parulekar, Advait ^{[1
]}

Collins, Liam ^{[1
]}

Shanmugam, Karthikeyan ^{[2
]}

Mokhtari, Aryan ^{[1
]}

Shakkottai, Sanjay ^{[1
]}

机构：

[1] Univ Texas Austin, Austin, TX 78712 USA

[2] Google Res India, Mumbai, Maharashtra, India

来源：

THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195 | 2023年 / 195卷

关键词：

Contrastive learning; Representation learning; Self-supervised learning;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The goal of contrasting learning is to learn a representation that preserves underlying clusters by keeping samples with similar content, e.g. the "dogness" of a dog, close to each other in the space generated by the representation. A common and successful approach for tackling this unsupervised learning problem is minimizing the InfoNCE loss associated with the training samples, where each sample is associated with their augmentations (positive samples such as rotation, crop) and a batch of negative samples (unrelated samples). To the best of our knowledge, it was unanswered if the representation learned by minimizing the InfoNCE loss preserves the underlying data clusters, as it only promotes learning a representation that is faithful to augmentations, i.e., an image and its augmentations have the same representation. Our main result is to show that the representation learned by InfoNCE with a finite number of negative samples is also consistent with respect to clusters in the data, under the condition that the augmentation sets within clusters may be non-overlapping but are close and intertwined, relative to the complexity of the learning function class.

引用

页数：48

共 11 条

[1] Heavy Hitters via Cluster-Preserving Clustering
Larsen, Kasper Green
Nelson, Jelani
Huy L Nguyen
Thorup, Mikkel
COMMUNICATIONS OF THE ACM, 2019, 62 (08) : 95 - 100
[2] Heavy hitters via cluster-preserving clustering
Larsen, Kasper Green
Nelson, Jelani
Nguyen, Huy L.
Thorup, Mikkel
2016 IEEE 57TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2016, : 61 - 70
[3] Cluster-preserving dimension reduction methods for document classification
Howland, Peg
Park, Haesun
SURVEY OF TEXT MINING II: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2008, : 3 - +
[4] Contrastive Learning with Cluster-Preserving Augmentation for Attributed Graph Clustering
Zheng, Yimei
Jia, Caiyan
Yu, Jian
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT I, 2023, 14169 : 644 - 661
[5] Cluster-preserving sampling algorithm for large-scale graphs
Jianpeng ZHANG
Hongchang CHEN
Dingjiu YU
Yulong PEI
Yingjun DENG
Science China(Information Sciences), 2023, 66 (01) : 60 - 76
[6] Cluster-preserving sampling algorithm for large-scale graphs
Jianpeng Zhang
Hongchang Chen
Dingjiu Yu
Yulong Pei
Yingjun Deng
Science China Information Sciences, 2023, 66
[7] Cluster-preserving sampling algorithm for large-scale graphs
Zhang, Jianpeng
Chen, Hongchang
Yu, Dingjiu
Pei, Yulong
Deng, Yingjun
SCIENCE CHINA-INFORMATION SCIENCES, 2023, 66 (01)
[8] Cluster-preserving sampling from fully-dynamic streaming graphs
Zhang, Jianpeng
Zhu, Kaijie
Pei, Yulong
Fletcher, George
Pechenizkiy, Mykola
INFORMATION SCIENCES, 2019, 482 : 279 - 300
[9] Attributed graph clustering under the contrastive mechanism with cluster-preserving augmentation
Zheng, Yimei
Jia, Caiyan
Yu, Jian
INFORMATION SCIENCES, 2024, 681
[10] Cluster-preserving dimension reduction methods for efficient classification of text data
Howland, P
Park, H
SURVEY OF TEXT MINING: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2004, : 3 - 23

← 1 2 →