Time-weighted counting for recently frequent pattern mining in data streams

被引:0
|
作者
Yongsub Lim
U. Kang
机构
[1] SK Telecom,Big Data Tech. Lab
[2] Seoul National University,Department of Computer Science and Engineering
来源
关键词
Data stream; Time-weighted counting; Sampling; Frequent items; Frequent itemsets; Hot items; Top-; items;
D O I
暂无
中图分类号
学科分类号
摘要
How can we discover interesting patterns from time-evolving high-speed data streams? How to analyze the data streams quickly and accurately, with little space overhead? How to guarantee the found patterns to be self-consistent? High-speed data stream has been receiving increasing attention due to its wide applications such as sensors, network traffic, social networks, etc. The most fundamental task on the data stream is frequent pattern mining; especially, focusing on recentness is important in real applications. In this paper, we develop two algorithms for discovering recently frequent patterns in data streams. First, we propose TwMinSwap to find top-k recently frequent items in data streams, which is a deterministic version of our motivating algorithm TwSample providing theoretical guarantees based on item sampling. TwMinSwap improves TwSample in terms of speed, accuracy, and memory usage. Both require only O(k) memory spaces and do not require any prior knowledge on the stream such as its length and the number of distinct items in the stream. Second, we propose TwMinSwap-Is to find top-k recently frequent itemsets in data streams. We especially focus on keeping self-consistency of the discovered itemsets, which is the most important property for reliable results, while using O(k) memory space with the assumption of a constant itemset size. Through extensive experiments, we demonstrate that TwMinSwap outperforms all competitors in terms of accuracy and memory usage, with fast running time. We also show that TwMinSwap-Is is more accurate than the competitor and discovers recently frequent itemsets with reasonably large sizes (at most 5–7) depending on datasets. Thanks to TwMinSwap and TwMinSwap-Is, we report interesting discoveries in real world data streams, including the difference of trends between the winner and the loser of U.S. presidential candidates, and temporal human contact patterns.
引用
收藏
页码:391 / 422
页数:31
相关论文
共 50 条
  • [21] Mining weighted frequent itemsets using window sliding over data streams
    Kim, Younghee
    Kim, Wonyoung
    Ryu, Joonsuk
    Kim, Ungmo
    ICCIT: 2009 FOURTH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND CONVERGENCE INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2009, : 708 - 713
  • [22] Mining frequent itemsets in data streams using the weighted sliding window model
    Tsai, Pauray S. M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (09) : 11617 - 11625
  • [23] Join Streaming Time-weighted Data
    Chen, Shih-Ying
    Chen, Hsiu-Hsiu
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION AND MANAGEMENT SCIENCES, 2009, 8 : 385 - 389
  • [24] A Frequent Itemset Reduction Algorithm for Global Pattern Mining on Distributed Data Streams
    Shalini
    Jain, Sanjay Kumar
    2017 TENTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2017, : 205 - 210
  • [25] Frequent pattern mining algorithm for uncertain data streams based on sliding window
    Yang, Junrui
    Yang, Cai
    Wei, Yanjun
    2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL. 2, 2016, : 265 - 268
  • [26] Sliding window-based frequent pattern mining over data streams
    Tanbeer, Syed Khairuzzaman
    Ahmed, Chowdhury Farhan
    Jeong, Byeong-Soo
    Lee, Young-Koo
    INFORMATION SCIENCES, 2009, 179 (22) : 3843 - 3865
  • [27] Finding (recently) frequent items in distributed data streams
    Manjhi, A
    Shkapenyuk, V
    Dhamdhere, K
    Olston, C
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 767 - 778
  • [28] Mining Frequent Patterns in the Recent Time Window over Data Streams
    Chen, Hui
    HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 586 - 593
  • [29] Adaptive Estimation of Weight Coefficient in a Time-weighted Incremental EM-algorithm for Data Streams
    Nissenbaum, Olga, V
    Kharchenko, Anastasia M.
    VESTNIK TOMSKOGO GOSUDARSTVENNOGO UNIVERSITETA-UPRAVLENIE VYCHISLITELNAJA TEHNIKA I INFORMATIKA-TOMSK STATE UNIVERSITY JOURNAL OF CONTROL AND COMPUTER SCIENCE, 2016, 37 (04): : 65 - 72
  • [30] Frequent pattern mining from multivariate time series data
    Karaca, Meserret
    Alvarado, Michelle M.
    Gahrooei, Mostafa Reisi
    Bihorac, Azra
    Pardalos, Panos M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 194