Near-Optimal k-Clustering in the Sliding Window Model

被引:0
|
作者
Woodruff, David P. [1 ]
Zhong, Peilin [2 ]
Zhou, Samson [3 ]
机构
[1] CMU, Pittsburgh, PA 15213 USA
[2] Google Res, Mountain View, CA USA
[3] Texas A&M Univ, College Stn, TX 77843 USA
基金
美国国家科学基金会;
关键词
CORESETS; FRAMEWORK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is an important technique for identifying structural information in large-scale data analysis, where the underlying dataset may be too large to store. In many applications, recent data can provide more accurate information and thus older data past a certain time is expired. The sliding window model captures these desired properties and thus there has been substantial interest in clustering in the sliding window model. In this paper, we give the first algorithm that achieves near-optimal (1 + epsilon)-approximation to (k, z)-clustering in the sliding window model, where z is the exponent of the distance function in the cost. Our algorithm uses k/min(epsilon(4),epsilon(2+z)) polylog (n Delta/epsilon) words of space when the points are from [Delta](d), thus significantly improving on works by Braverman et. al. (SODA 2016), Borassi et. al. (NeurIPS 2021), and Epasto et. al. (SODA 2022). Along the way, we develop a data structure for clustering called an online coreset, which outputs a coreset not only for the end of a stream, but also for all prefixes of the stream. Our online coreset samples k/min(epsilon(4),epsilon(2+z)) polylog (n Delta/epsilon) points from the stream. We then show that any online coreset requires Omega(k/epsilon(2) log n) samples, which shows a separation from the problem of constructing an offline coreset, i.e., constructing online coresets is strictly harder. Our results also extend to general metrics on [Delta](d) and are near-optimal in light of a Omega(k/epsilon(2+z)) lower bound for the size of an offline coreset.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] K-CLUSTERING AND THE DETECTION OF INFLUENTIAL SUBSETS - RESPONSE
    LING, RF
    GRAY, JB
    TECHNOMETRICS, 1985, 27 (03) : 324 - 325
  • [32] Efficient Online Learning for Dynamic k-Clustering
    Fotakis, Dimitris
    Piliouras, Georgios
    Skoulakis, Stratis
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [33] Competitive Self-Stabilizing k-Clustering
    Datta, Ajoy K.
    Larmore, Lawrence L.
    Devismes, Stephane
    Heurtefeux, Karel
    Rivierre, Yvan
    2012 IEEE 32ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2012, : 476 - 485
  • [34] Better Algorithms for Individually Fair k-Clustering
    Chakrabarty, Deeparnab
    Negahbani, Maryam
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [35] Distributed k-Clustering for Data with Heavy Noise
    Guo, Xiangyu
    Li, Shi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [36] Correlation k-Clustering in Trees and Trees of Rings
    Xin Xiao
    Li Shuguang
    2009 WASE INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING, ICIE 2009, VOL I, 2009, : 217 - +
  • [37] Near-optimal blacklisting
    Dimitrakakis, Christos
    Mitrokotsa, Aikaterini
    COMPUTERS & SECURITY, 2017, 64 : 110 - 121
  • [38] Algorithms for fair k-clustering with multiple protected attributes
    Bohm, Matteo
    Fazzone, Adriano
    Leonardi, Stefano
    Menghini, Cristina
    Schwiegelshohn, Chris
    OPERATIONS RESEARCH LETTERS, 2021, 49 (05) : 787 - 789
  • [39] Near-Optimal Explainable k-Means for All Dimensions
    Charikar, Moses
    Hu, Lunjia
    PROCEEDINGS OF THE 2022 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2022, : 2580 - 2606
  • [40] Simultaneous stabilization with near-optimal model reference tracking
    Miller, D
    Kennedy, D
    SYSTEMS & CONTROL LETTERS, 2002, 46 (01) : 31 - 43