On the Fly Detection of the Top-k Items in the Distributed Sliding Window Model

被引:0
|
作者
Anceaume, Emmanuelle [1 ]
Busnel, Yann [2 ]
Cazacu, Vasile [1 ]
机构
[1] IRISA, CNRS, Rennes, France
[2] IMT Atlantique, IRISA, Cesson Sevigne, France
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a new algorithm that detects on the fly the k most frequent items in the sliding window model. This algorithm is distributed among the nodes of the system. It is inspired by a recent and innovative approach, which consists in associating a stochastic value correlated with the item's frequency instead of trying to estimate its number of occurrences. This stochastic value corresponds to the number of consecutive heads in coin flipping until the first tail occurs. The original approach was to retain just the maximum of consecutive heads obtained by an item, since an item that often occurs will have a higher probability of having a high value. While effective for very skewed data distributions, the correlation is not tight enough to robustly distinguish items with comparable frequencies. To address this important issue, we propose to combine the stochastic approach together with a deterministic counting of items. Specifically, in place of keeping the maximum number of consecutive heads obtained by an item, we count the number of times the coin flipping process of an item has exceeded a given threshold. This threshold is defined by combining theoretical results in leader election and coupon collector problems. Results on simulated data show how impressive is the detection of the top-k items in a large range of distributions.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Supporting efficient distributed top-k monitoring
    Deng, Bo
    Jia, Yan
    Yang, Shuqiang
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2006, 4016 : 496 - 507
  • [42] Efficient processing of distributed top-k queries
    Yu, HL
    Li, HG
    Wu, P
    Agrawal, D
    El Abbadi, A
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2005, 3588 : 65 - 74
  • [43] User Preference Translation Model for Next Top-k Items Recommendation with Social Relations
    Ma, Hao-Shang
    Huang, Jen-Wei
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT III, 2021, 12683 : 652 - 655
  • [44] Lightweight Approximate Top-k for Distributed Settings
    Deolalikar, Vinay
    Eshghi, Kave
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 835 - 844
  • [45] Distributed Evaluation of Top-k Temporal Joins
    Pilourdault, Julien
    Leroy, Vincent
    Amer-Yahia, Sihem
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1027 - 1039
  • [46] ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model
    Cheng, Haodong
    Han, Meng
    Zhang, Ni
    Wang, Le
    Li, Xiaojuan
    Journal of Intelligent and Fuzzy Systems, 2021, 41 (02): : 3317 - 3338
  • [47] ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model
    Cheng, Haodong
    Han, Meng
    Zhang, Ni
    Wang, Le
    Li, Xiaojuan
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (02) : 3317 - 3338
  • [48] Distributed top-k aggregation queries at large
    Thomas Neumann
    Matthias Bender
    Sebastian Michel
    Ralf Schenkel
    Peter Triantafillou
    Gerhard Weikum
    Distributed and Parallel Databases, 2009, 26 : 3 - 27
  • [49] Distributed top-k aggregation queries at large
    Neumann, Thomas
    Bender, Matthias
    Michel, Sebastian
    Schenkel, Ralf
    Triantafillou, Peter
    Weikum, Gerhard
    DISTRIBUTED AND PARALLEL DATABASES, 2009, 26 (01) : 3 - 27
  • [50] On the Practical Detection of the Top-k Flows
    Moraney, Jalil
    Raz, Danny
    2018 14TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2018, : 81 - 89