Near-Optimal k-Clustering in the Sliding Window Model

被引:0
|
作者
Woodruff, David P. [1 ]
Zhong, Peilin [2 ]
Zhou, Samson [3 ]
机构
[1] CMU, Pittsburgh, PA 15213 USA
[2] Google Res, Mountain View, CA USA
[3] Texas A&M Univ, College Stn, TX 77843 USA
基金
美国国家科学基金会;
关键词
CORESETS; FRAMEWORK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is an important technique for identifying structural information in large-scale data analysis, where the underlying dataset may be too large to store. In many applications, recent data can provide more accurate information and thus older data past a certain time is expired. The sliding window model captures these desired properties and thus there has been substantial interest in clustering in the sliding window model. In this paper, we give the first algorithm that achieves near-optimal (1 + epsilon)-approximation to (k, z)-clustering in the sliding window model, where z is the exponent of the distance function in the cost. Our algorithm uses k/min(epsilon(4),epsilon(2+z)) polylog (n Delta/epsilon) words of space when the points are from [Delta](d), thus significantly improving on works by Braverman et. al. (SODA 2016), Borassi et. al. (NeurIPS 2021), and Epasto et. al. (SODA 2022). Along the way, we develop a data structure for clustering called an online coreset, which outputs a coreset not only for the end of a stream, but also for all prefixes of the stream. Our online coreset samples k/min(epsilon(4),epsilon(2+z)) polylog (n Delta/epsilon) points from the stream. We then show that any online coreset requires Omega(k/epsilon(2) log n) samples, which shows a separation from the problem of constructing an offline coreset, i.e., constructing online coresets is strictly harder. Our results also extend to general metrics on [Delta](d) and are near-optimal in light of a Omega(k/epsilon(2+z)) lower bound for the size of an offline coreset.
引用
收藏
页数:27
相关论文
共 50 条
  • [41] Near-optimal Algorithms for Explainable k-Medians and k-Means
    Makarychev, Konstantin
    Shan, Liren
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [42] pknotsRG: RNA pseudoknot folding including near-optimal structures and sliding windows
    Reeder, Jens
    Steffen, Peter
    Giegerich, Robert
    NUCLEIC ACIDS RESEARCH, 2007, 35 : W320 - W324
  • [43] H2 near-optimal model reduction
    Huang, XX
    Yan, WY
    Teo, KL
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2001, 46 (08) : 1279 - 1284
  • [44] Investigation of sliding mode dynamics and near-optimal controls for a reaction-diffusion population model in a polluted environment
    Ma, An
    Hu, Jing
    Ye, Ming
    Zhang, Qimin
    EUROPEAN JOURNAL OF CONTROL, 2024, 79
  • [45] K-CLUSTERING AS A DETECTION TOOL FOR INFLUENTIAL SUBSETS IN REGRESSION
    GRAY, JB
    LING, RF
    TECHNOMETRICS, 1984, 26 (04) : 305 - 318
  • [46] One-Shot Coresets: The Case of k-Clustering
    Bachem, Olivier
    Lucic, Mario
    Lattanzi, Silvio
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [47] Approximation schemes for Min-Sum k-Clustering
    Naderi, Ismail
    Rezapour, Mohsen
    Salavatipour, Mohammad R.
    DISCRETE OPTIMIZATION, 2024, 54
  • [48] Barriers to Near-Optimal Equilibria
    Roughgarden, Tim
    2014 55TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2014), 2014, : 71 - 80
  • [49] Universal Near-Optimal Feedbacks
    S. Nobakhtian
    R. J. Stern
    Journal of Optimization Theory and Applications, 2000, 107 : 89 - 122
  • [50] Polynomial time approximation schemes for geometric k-clustering
    Ostrovsky, R
    Rabani, Y
    41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 349 - 358