On the Fly Detection of the Top-k Items in the Distributed Sliding Window Model

被引:0
|
作者
Anceaume, Emmanuelle [1 ]
Busnel, Yann [2 ]
Cazacu, Vasile [1 ]
机构
[1] IRISA, CNRS, Rennes, France
[2] IMT Atlantique, IRISA, Cesson Sevigne, France
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a new algorithm that detects on the fly the k most frequent items in the sliding window model. This algorithm is distributed among the nodes of the system. It is inspired by a recent and innovative approach, which consists in associating a stochastic value correlated with the item's frequency instead of trying to estimate its number of occurrences. This stochastic value corresponds to the number of consecutive heads in coin flipping until the first tail occurs. The original approach was to retain just the maximum of consecutive heads obtained by an item, since an item that often occurs will have a higher probability of having a high value. While effective for very skewed data distributions, the correlation is not tight enough to robustly distinguish items with comparable frequencies. To address this important issue, we propose to combine the stochastic approach together with a deterministic counting of items. Specifically, in place of keeping the maximum number of consecutive heads obtained by an item, we count the number of times the coin flipping process of an item has exceeded a given threshold. This threshold is defined by combining theoretical results in leader election and coupon collector problems. Results on simulated data show how impressive is the detection of the top-k items in a large range of distributions.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Finding Top-k Most Frequent Items in Distributed Streams in the Time-Sliding Window Model
    Anceaume, Emmanuelle
    Busnel, Yann
    Cazacu, Vasile
    2018 48TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS (DSN-W), 2018, : 61 - 62
  • [2] Sliding Window Top-K Monitoring over Distributed Data Streams
    Lv, Zhijin
    Chen, Ben
    Yu, Xiaohui
    WEB AND BIG DATA, APWEB-WAIM 2017, PT I, 2017, 10366 : 527 - 540
  • [3] Sliding Window Top-K Monitoring over Distributed Data Streams
    Chen B.
    Lv Z.
    Yu X.
    Liu Y.
    Data Science and Engineering, 2017, 2 (4) : 289 - 300
  • [4] Sliding window top-k dominating query processing over distributed data streams
    Amagata, Daichi
    Hara, Takahiro
    Nishio, Shojiro
    DISTRIBUTED AND PARALLEL DATABASES, 2016, 34 (04) : 535 - 566
  • [5] Sliding window top-k dominating query processing over distributed data streams
    Daichi Amagata
    Takahiro Hara
    Shojiro Nishio
    Distributed and Parallel Databases, 2016, 34 : 535 - 566
  • [6] Approximate Continuous Top-k Query over Sliding Window
    Rui Zhu
    Bin Wang
    Shi-Ying Luo
    Xiao-Chun Yang
    Guo-Ren Wang
    Journal of Computer Science and Technology, 2017, 32 : 93 - 109
  • [7] Approximate Continuous Top-k Query over Sliding Window
    Zhu, Rui
    Wang, Bin
    Luo, Shi-Ying
    Yang, Xiao-Chun
    Wang, Guo-Ren
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (01) : 93 - 109
  • [8] Sliding-window top-k queries on uncertain streams
    Jin, Cheqing
    Yi, Ke
    Chen, Lei
    Yu, Jeffrey Xu
    Lin, Xuemin
    VLDB JOURNAL, 2010, 19 (03): : 411 - 435
  • [9] Sliding-window top-k queries on uncertain streams
    Cheqing Jin
    Ke Yi
    Lei Chen
    Jeffrey Xu Yu
    Xuemin Lin
    The VLDB Journal, 2010, 19 : 411 - 435
  • [10] Sliding-Window Top-k Queries on Uncertain Streams
    Jin, Cheqing
    Yi, Ke
    Chen, Lei
    Yu, Jeffrey Xu
    Lin, Xuemin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01): : 301 - 312