Near-Optimal k-Clustering in the Sliding Window Model

被引:0
|
作者
Woodruff, David P. [1 ]
Zhong, Peilin [2 ]
Zhou, Samson [3 ]
机构
[1] CMU, Pittsburgh, PA 15213 USA
[2] Google Res, Mountain View, CA USA
[3] Texas A&M Univ, College Stn, TX 77843 USA
基金
美国国家科学基金会;
关键词
CORESETS; FRAMEWORK;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is an important technique for identifying structural information in large-scale data analysis, where the underlying dataset may be too large to store. In many applications, recent data can provide more accurate information and thus older data past a certain time is expired. The sliding window model captures these desired properties and thus there has been substantial interest in clustering in the sliding window model. In this paper, we give the first algorithm that achieves near-optimal (1 + epsilon)-approximation to (k, z)-clustering in the sliding window model, where z is the exponent of the distance function in the cost. Our algorithm uses k/min(epsilon(4),epsilon(2+z)) polylog (n Delta/epsilon) words of space when the points are from [Delta](d), thus significantly improving on works by Braverman et. al. (SODA 2016), Borassi et. al. (NeurIPS 2021), and Epasto et. al. (SODA 2022). Along the way, we develop a data structure for clustering called an online coreset, which outputs a coreset not only for the end of a stream, but also for all prefixes of the stream. Our online coreset samples k/min(epsilon(4),epsilon(2+z)) polylog (n Delta/epsilon) points from the stream. We then show that any online coreset requires Omega(k/epsilon(2) log n) samples, which shows a separation from the problem of constructing an offline coreset, i.e., constructing online coresets is strictly harder. Our results also extend to general metrics on [Delta](d) and are near-optimal in light of a Omega(k/epsilon(2+z)) lower bound for the size of an offline coreset.
引用
收藏
页数:27
相关论文
共 50 条
  • [21] Large-Scale K-Clustering
    Voevodski, Konstan tin
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (09)
  • [22] A Generalized Minimum Cost k-Clustering
    Levin, Asaf
    ACM TRANSACTIONS ON ALGORITHMS, 2009, 5 (04)
  • [23] Onion Curve: A Space Filling Curve with Near-Optimal Clustering
    Xu, Pan
    Cuong Nguyen
    Tirthapura, Srikanta
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1236 - 1239
  • [24] Locally Private k-Means Clustering with Constant Multiplicative Approximation and Near-Optimal Additive Error
    Chaturvedi, Anamay
    Jones, Matthew
    Huy Le Nguyen
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6167 - 6174
  • [25] Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering
    Kazakovtsev, Lev
    Rozhnov, Ivan
    Popov, Aleksey
    Tovbis, Elena
    COMPUTATION, 2020, 8 (04) : 1 - 32
  • [26] Robust near-optimal control via unchattering sliding mode control
    Bartolini, G
    Sanna, S
    Usai, E
    COMPUTING ANTICIPATORY SYSTEMS: CASYS - FIRST INTERNATIONAL CONFERENCE, 1998, 437 : 269 - 283
  • [27] Competitive self-stabilizing k-clustering
    Datta, Ajoy K.
    Devismes, Stephane
    Heurtefeux, Karel
    Larmore, Lawrence L.
    Rivierre, Yvan
    THEORETICAL COMPUTER SCIENCE, 2016, 626 : 110 - 133
  • [28] Near Optimal Linear Algebra in the Online and Sliding Window Models
    Braverman, Vladimir
    Drineas, Petros
    Musco, Cameron
    Musco, Christopher
    Upadhyay, Jalaj
    Woodruff, David P.
    Zhou, Samson
    2020 IEEE 61ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2020), 2020, : 517 - 528
  • [29] Near-optimal node clustering in wireless sensor networks for environment monitoring
    Xia, Dawei
    Vlajic, Natalija
    2006 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-5, 2006, : 1088 - +
  • [30] Near-optimal node clustering in wireless sensor networks for environment monitoring
    Xia, Dawei
    Vlajic, Natalija
    21ST INTERNATIONAL CONFERENCE ON ADVANCED NETWORKING AND APPLICATIONS, PROCEEDINGS, 2007, : 632 - +