On the Fly Detection of the Top-k Items in the Distributed Sliding Window Model

被引:0
|
作者
Anceaume, Emmanuelle [1 ]
Busnel, Yann [2 ]
Cazacu, Vasile [1 ]
机构
[1] IRISA, CNRS, Rennes, France
[2] IMT Atlantique, IRISA, Cesson Sevigne, France
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a new algorithm that detects on the fly the k most frequent items in the sliding window model. This algorithm is distributed among the nodes of the system. It is inspired by a recent and innovative approach, which consists in associating a stochastic value correlated with the item's frequency instead of trying to estimate its number of occurrences. This stochastic value corresponds to the number of consecutive heads in coin flipping until the first tail occurs. The original approach was to retain just the maximum of consecutive heads obtained by an item, since an item that often occurs will have a higher probability of having a high value. While effective for very skewed data distributions, the correlation is not tight enough to robustly distinguish items with comparable frequencies. To address this important issue, we propose to combine the stochastic approach together with a deterministic counting of items. Specifically, in place of keeping the maximum number of consecutive heads obtained by an item, we count the number of times the coin flipping process of an item has exceeded a given threshold. This threshold is defined by combining theoretical results in leader election and coupon collector problems. Results on simulated data show how impressive is the detection of the top-k items in a large range of distributions.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Mining top-k frequent patterns over data streams sliding window
    Hui Chen
    Journal of Intelligent Information Systems, 2014, 42 : 111 - 131
  • [22] Spatio-temporal top-k term search over sliding window
    Chen, Lisi
    Shang, Shuo
    Yao, Bin
    Zheng, Kai
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (05): : 1953 - 1970
  • [23] Efficiently Finding Top-K Items from Evolving Distributed Data Streams
    Qi, Baoyuan
    Ma, Gang
    Shi, Zhongzhi
    Wang, Wei
    2014 10TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2014, : 137 - 140
  • [24] Mining top-k frequent closed itemsets over data streams using the sliding window model
    Tsai, Pauray S. M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (10) : 6968 - 6973
  • [25] Time- and Space-Efficient Sliding Window Top-k Query Processing
    Pripuzic, Kresimir
    Zarko, Ivana Podnar
    Aberer, Karl
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2015, 40 (01):
  • [26] Geo-Social Keyword Top-k Data Monitoring over Sliding Window
    Nishio, Shunya
    Amagata, Daichi
    Hara, Takahiro
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2017, PT I, 2017, 10438 : 409 - 424
  • [27] Mining top-k high-utility itemsets from a data stream under sliding window model
    Siddharth Dawar
    Veronica Sharma
    Vikram Goyal
    Applied Intelligence, 2017, 47 : 1240 - 1255
  • [28] SKYPE: Top-k Spatial-keyword Publish/Subscribe Over Sliding Window
    Wang, Xiang
    Zhang, Ying
    Zhang, Wenjie
    Lin, Xuemin
    Huang, Zengfeng
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (07): : 588 - 599
  • [29] Mining top-k high-utility itemsets from a data stream under sliding window model
    Dawar, Siddharth
    Sharma, Veronica
    Goyal, Vikram
    APPLIED INTELLIGENCE, 2017, 47 (04) : 1240 - 1255
  • [30] Top-k Frequent Items and Item Frequency Tracking over Sliding Windows of Any Sizes
    Song, Chunyao
    Liu, Xuanming
    Ge, Tingjian
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 199 - 202