On the Fly Detection of the Top-k Items in the Distributed Sliding Window Model

被引:0
|
作者
Anceaume, Emmanuelle [1 ]
Busnel, Yann [2 ]
Cazacu, Vasile [1 ]
机构
[1] IRISA, CNRS, Rennes, France
[2] IMT Atlantique, IRISA, Cesson Sevigne, France
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a new algorithm that detects on the fly the k most frequent items in the sliding window model. This algorithm is distributed among the nodes of the system. It is inspired by a recent and innovative approach, which consists in associating a stochastic value correlated with the item's frequency instead of trying to estimate its number of occurrences. This stochastic value corresponds to the number of consecutive heads in coin flipping until the first tail occurs. The original approach was to retain just the maximum of consecutive heads obtained by an item, since an item that often occurs will have a higher probability of having a high value. While effective for very skewed data distributions, the correlation is not tight enough to robustly distinguish items with comparable frequencies. To address this important issue, we propose to combine the stochastic approach together with a deterministic counting of items. Specifically, in place of keeping the maximum number of consecutive heads obtained by an item, we count the number of times the coin flipping process of an item has exceeded a given threshold. This threshold is defined by combining theoretical results in leader election and coupon collector problems. Results on simulated data show how impressive is the detection of the top-k items in a large range of distributions.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Top-k frequent items and item frequency tracking over sliding windows of any size
    Song, Chunyao
    Liu, Xuanming
    Ge, Tingjian
    Ge, Yao
    INFORMATION SCIENCES, 2019, 475 : 100 - 120
  • [32] Distributed Top-k subgraph matching
    Lan C.
    Zhang Y.
    Xing C.
    Xing, Chunxiao (xingcx@tsinghua.edu.cn), 1600, Tsinghua University (56): : 871 - 877
  • [33] Approximate distributed top-k queries
    Boaz Patt-Shamir
    Allon Shafrir
    Distributed Computing, 2008, 21 : 1 - 22
  • [34] Optimizing Distributed Top-k Queries
    Neumann, Thomas
    Bender, Matthias
    Michel, Sebastian
    Schenkel, Ralf
    Triantafillou, Peter
    Weikum, Gerhard
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2008, PROCEEDINGS, 2008, 5175 : 337 - +
  • [35] Distributed Top-k Pattern Mining
    Wang, Xin
    Xiang, Mingyue
    Zhan, Huayi
    Lan, Zhuo
    He, Yuang
    He, Yanxiao
    Sha, Yuji
    WEB AND BIG DATA, APWEB-WAIM 2021, PT II, 2021, 12859 : 203 - 220
  • [36] Secure Distributed Top-k Aggregation
    Jonsson, Kristjan V.
    Palmskog, Karl
    Vigfusson, Ymir
    2012 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2012,
  • [37] Approximate distributed top-k queries
    Patt-Shamir, Boaz
    Shafrir, Allon
    DISTRIBUTED COMPUTING, 2008, 21 (01) : 1 - 22
  • [38] A Generic Framework for Top-k Pairs and Top-k Objects Queries over Sliding Windows
    Shen, Zhitao
    Cheema, Muhammad Aamir
    Lin, Xuemin
    Zhang, Wenjie
    Wang, Haixun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (06) : 1349 - 1366
  • [39] A Sliding-Window Approach for Finding Top-k Frequent Itemsets from Uncertain Streams
    Zhang, Xiaojian
    Peng, Huili
    ADVANCES IN DATA AND WEB MANAGEMENT, PROCEEDINGS, 2009, 5446 : 597 - +
  • [40] Achieving Top-K-fairness for Finding Global Top-K Frequent Items
    Zhao, Yikai
    Zhou, Wei
    Han, Wenchen
    Zhong, Zheng
    Zhang, Yinda
    Zheng, Xiuqi
    Yang, Tong
    Cui, Bin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (04) : 1508 - 1526