Near-Optimal k-Clustering in the Sliding Window Model

被引:0
|
作者
Woodruff, David P. [1 ]
Zhong, Peilin [2 ]
Zhou, Samson [3 ]
机构
[1] CMU, Pittsburgh, PA 15213 USA
[2] Google Res, Mountain View, CA USA
[3] Texas A&M Univ, College Stn, TX 77843 USA
基金
美国国家科学基金会;
关键词
CORESETS; FRAMEWORK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is an important technique for identifying structural information in large-scale data analysis, where the underlying dataset may be too large to store. In many applications, recent data can provide more accurate information and thus older data past a certain time is expired. The sliding window model captures these desired properties and thus there has been substantial interest in clustering in the sliding window model. In this paper, we give the first algorithm that achieves near-optimal (1 + epsilon)-approximation to (k, z)-clustering in the sliding window model, where z is the exponent of the distance function in the cost. Our algorithm uses k/min(epsilon(4),epsilon(2+z)) polylog (n Delta/epsilon) words of space when the points are from [Delta](d), thus significantly improving on works by Braverman et. al. (SODA 2016), Borassi et. al. (NeurIPS 2021), and Epasto et. al. (SODA 2022). Along the way, we develop a data structure for clustering called an online coreset, which outputs a coreset not only for the end of a stream, but also for all prefixes of the stream. Our online coreset samples k/min(epsilon(4),epsilon(2+z)) polylog (n Delta/epsilon) points from the stream. We then show that any online coreset requires Omega(k/epsilon(2) log n) samples, which shows a separation from the problem of constructing an offline coreset, i.e., constructing online coresets is strictly harder. Our results also extend to general metrics on [Delta](d) and are near-optimal in light of a Omega(k/epsilon(2+z)) lower bound for the size of an offline coreset.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Near-Optimal Private and Scalable k-Clustering
    Cohen-Addad, Vincent
    Epasto, Alessandro
    Mirrokni, Vahab
    Narayanan, Shyam
    Zhong, Peilin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [2] Sliding Window Algorithms for k-Clustering Problems
    Borassi, Michele
    Epasto, Alessandro
    Lattanzi, Silvio
    Vassilviskii, Sergei
    Zadimoghaddam, Morteza
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [3] Near-Optimal Clustering in the k-machine model
    Bandyapadhyay, Sayan
    Inamdar, Tanmay
    Pai, Shreyas
    Pemmaraju, Sriram V.
    ICDCN'18: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING, 2018,
  • [4] Near-optimal clustering in the k-machine model
    Bandyapadhyay, Sayan
    Inamdar, Tanmay
    Pai, Shreyas
    Pemmaraju, Sriram, V
    THEORETICAL COMPUTER SCIENCE, 2022, 899 : 80 - 97
  • [5] ALMOST OPTIMAL SOLUTIONS TO k-CLUSTERING PROBLEMS
    Kumar, Pankaj
    Kumar, Piyush
    INTERNATIONAL JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS, 2010, 20 (04) : 431 - 447
  • [6] Near-optimal large-scale k-medoids clustering
    Ushakov, Anton V.
    Vasilyev, Igor
    INFORMATION SCIENCES, 2021, 545 : 344 - 362
  • [7] Near-Optimal Correlation Clustering with Privacy
    Cohen-Addad, Vincent
    Fan, Chenglin
    Lattanzi, Silvio
    Mitrovic, Slobodan
    Norouzi-Fard, Ashkan
    Parotsidis, Nikos
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [8] Near-Optimal Comparison Based Clustering
    Perrot, Michael
    Esser, Pascal Mattia
    Ghoshdastidar, Debarghya
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [9] Consistent k-Clustering
    Lattanzi, Silvio
    Vassilvitskii, Sergei
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [10] On approximate geometric k-clustering
    Matousek, J
    DISCRETE & COMPUTATIONAL GEOMETRY, 2000, 24 (01) : 61 - 84