Time-weighted counting for recently frequent pattern mining in data streams

被引:0
|
作者
Yongsub Lim
U. Kang
机构
[1] SK Telecom,Big Data Tech. Lab
[2] Seoul National University,Department of Computer Science and Engineering
来源
关键词
Data stream; Time-weighted counting; Sampling; Frequent items; Frequent itemsets; Hot items; Top-; items;
D O I
暂无
中图分类号
学科分类号
摘要
How can we discover interesting patterns from time-evolving high-speed data streams? How to analyze the data streams quickly and accurately, with little space overhead? How to guarantee the found patterns to be self-consistent? High-speed data stream has been receiving increasing attention due to its wide applications such as sensors, network traffic, social networks, etc. The most fundamental task on the data stream is frequent pattern mining; especially, focusing on recentness is important in real applications. In this paper, we develop two algorithms for discovering recently frequent patterns in data streams. First, we propose TwMinSwap to find top-k recently frequent items in data streams, which is a deterministic version of our motivating algorithm TwSample providing theoretical guarantees based on item sampling. TwMinSwap improves TwSample in terms of speed, accuracy, and memory usage. Both require only O(k) memory spaces and do not require any prior knowledge on the stream such as its length and the number of distinct items in the stream. Second, we propose TwMinSwap-Is to find top-k recently frequent itemsets in data streams. We especially focus on keeping self-consistency of the discovered itemsets, which is the most important property for reliable results, while using O(k) memory space with the assumption of a constant itemset size. Through extensive experiments, we demonstrate that TwMinSwap outperforms all competitors in terms of accuracy and memory usage, with fast running time. We also show that TwMinSwap-Is is more accurate than the competitor and discovers recently frequent itemsets with reasonably large sizes (at most 5–7) depending on datasets. Thanks to TwMinSwap and TwMinSwap-Is, we report interesting discoveries in real world data streams, including the difference of trends between the winner and the loser of U.S. presidential candidates, and temporal human contact patterns.
引用
收藏
页码:391 / 422
页数:31
相关论文
共 50 条
  • [31] Mining Recent Frequent Itemsets in Data Streams
    Li, Kun
    Wang, Yong-yan
    Ellahi, Manzoor
    Wang, Hong-an
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 4, PROCEEDINGS, 2008, : 353 - 358
  • [32] Mining evolving data streams for frequent patterns
    Laur, Pierre-Alain
    Nock, Richard
    Symphor, Jean-Emile
    Poncelet, Pascal
    PATTERN RECOGNITION, 2007, 40 (02) : 492 - 503
  • [33] MFIS - Mining frequent itemsets on data streams
    Xie, Zhi-jun
    Chen, Hong
    Li, Cuiping
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2006, 4093 : 1085 - 1093
  • [34] Mining Robust Frequent Items in Data Streams
    Xia, Rui
    Dai, Haipeng
    Du, Zhanchao
    Li, Meng
    Liu, Alex X.
    Chen, Guihai
    2020 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING (JCC 2020), 2020, : 110 - 117
  • [35] Data Streams Fusion by Frequent Correlations Mining
    Ziembinski, Radoslaw Z.
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2015, 2015, 9375 : 1 - 8
  • [36] CD-TDS: Change Detection in Transactional Data Streams for Frequent Pattern Mining
    Koh, Yun Sing
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1554 - 1561
  • [37] EclatDS: An efficient sliding window based frequent pattern mining method for data streams
    Deypir, Mahmood
    Sadreddini, Mohammad Hadi
    INTELLIGENT DATA ANALYSIS, 2011, 15 (04) : 571 - 587
  • [38] Mining Frequent Items Over the Distributed Hierarchical Continuous Weighted Data Streams in Internet of Things
    Zhang, Shuzhuang
    Zhang, Yu
    Yin, Lihua
    Yuan, Tingting
    Wu, Zhigang
    Luo, Hao
    IEEE ACCESS, 2019, 7 : 74890 - 74898
  • [39] A Sliding Window-Based Approach for Mining Frequent Weighted Patterns Over Data Streams
    Bui, Huong
    Nguyen-Hoang, Tu-Anh
    Vo, Bay
    Nguyen, Ham
    Le, Tuong
    IEEE ACCESS, 2021, 9 : 56318 - 56329
  • [40] Finding Recently Frequent Items over Online Data Streams
    尹志武
    黄上腾
    Journal of DongHua University, 2006, (06) : 53 - 56