Time-weighted counting for recently frequent pattern mining in data streams

被引:0
|
作者
Yongsub Lim
U. Kang
机构
[1] SK Telecom,Big Data Tech. Lab
[2] Seoul National University,Department of Computer Science and Engineering
来源
关键词
Data stream; Time-weighted counting; Sampling; Frequent items; Frequent itemsets; Hot items; Top-; items;
D O I
暂无
中图分类号
学科分类号
摘要
How can we discover interesting patterns from time-evolving high-speed data streams? How to analyze the data streams quickly and accurately, with little space overhead? How to guarantee the found patterns to be self-consistent? High-speed data stream has been receiving increasing attention due to its wide applications such as sensors, network traffic, social networks, etc. The most fundamental task on the data stream is frequent pattern mining; especially, focusing on recentness is important in real applications. In this paper, we develop two algorithms for discovering recently frequent patterns in data streams. First, we propose TwMinSwap to find top-k recently frequent items in data streams, which is a deterministic version of our motivating algorithm TwSample providing theoretical guarantees based on item sampling. TwMinSwap improves TwSample in terms of speed, accuracy, and memory usage. Both require only O(k) memory spaces and do not require any prior knowledge on the stream such as its length and the number of distinct items in the stream. Second, we propose TwMinSwap-Is to find top-k recently frequent itemsets in data streams. We especially focus on keeping self-consistency of the discovered itemsets, which is the most important property for reliable results, while using O(k) memory space with the assumption of a constant itemset size. Through extensive experiments, we demonstrate that TwMinSwap outperforms all competitors in terms of accuracy and memory usage, with fast running time. We also show that TwMinSwap-Is is more accurate than the competitor and discovers recently frequent itemsets with reasonably large sizes (at most 5–7) depending on datasets. Thanks to TwMinSwap and TwMinSwap-Is, we report interesting discoveries in real world data streams, including the difference of trends between the winner and the loser of U.S. presidential candidates, and temporal human contact patterns.
引用
收藏
页码:391 / 422
页数:31
相关论文
共 50 条
  • [1] Time-weighted counting for recently frequent pattern mining in data streams
    Kang, Yongsub U.
    Kang, U.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 53 (02) : 391 - 422
  • [2] A comparison between approximate counting and sampling methods for frequent pattern mining on data streams
    Ng, Willie
    Dash, Manoranjan
    INTELLIGENT DATA ANALYSIS, 2010, 14 (06) : 749 - 771
  • [3] Sliding window based weighted maximal frequent pattern mining over data streams
    Lee, Gangin
    Yun, Unil
    Ryu, Keun Ho
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (02) : 694 - 708
  • [4] Survey of the study on frequent pattern mining in data streams
    Wang, JL
    Xu, CF
    Chen, WD
    Pan, YH
    2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 5917 - 5922
  • [5] Online mining (recently) maximal frequent itemsets over data streams
    Li, H.-F. (hfli@csie.nctu.edu.tw), IEEE Computer Society Tech. Committee on Data Eng., TCDE (Institute of Electrical and Electronics Engineers Computer Society):
  • [6] Online mining (recently) maximal frequent itemsets over data streams
    Li, HF
    Lee, SY
    Shan, MK
    15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications, Proceedings, 2005, : 11 - 18
  • [7] An approximate approach for mining recently frequent itemsets from data streams
    Koh, Jia-Ling
    Shin, Shu-Ning
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 352 - 362
  • [8] Efficient Mining of Weighted Frequent Patterns Over Data Streams
    Ahmed, Chowdhury Farhan
    Tanbeer, Syed Khairuzzaman
    Jeong, Byeong-Soo
    HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 400 - 406
  • [9] Mining Weighted Frequent Patterns from Uncertain Data Streams
    Ovi, Jesan Ahammed
    Ahmed, Chowdhury Farhan
    Leung, Carson K.
    Pazdor, Adam G. M.
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM) 2019, 2019, 935 : 917 - 936
  • [10] A Real-Time Frequent Pattern Mining Algorithm for Semi Structured Data Streams
    Tong, Ziqi
    Liao, Husheng
    Jin, Xueyun
    2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 274 - 280