Time-weighted counting for recently frequent pattern mining in data streams

被引:0
|
作者
Yongsub Lim
U. Kang
机构
[1] SK Telecom,Big Data Tech. Lab
[2] Seoul National University,Department of Computer Science and Engineering
来源
关键词
Data stream; Time-weighted counting; Sampling; Frequent items; Frequent itemsets; Hot items; Top-; items;
D O I
暂无
中图分类号
学科分类号
摘要
How can we discover interesting patterns from time-evolving high-speed data streams? How to analyze the data streams quickly and accurately, with little space overhead? How to guarantee the found patterns to be self-consistent? High-speed data stream has been receiving increasing attention due to its wide applications such as sensors, network traffic, social networks, etc. The most fundamental task on the data stream is frequent pattern mining; especially, focusing on recentness is important in real applications. In this paper, we develop two algorithms for discovering recently frequent patterns in data streams. First, we propose TwMinSwap to find top-k recently frequent items in data streams, which is a deterministic version of our motivating algorithm TwSample providing theoretical guarantees based on item sampling. TwMinSwap improves TwSample in terms of speed, accuracy, and memory usage. Both require only O(k) memory spaces and do not require any prior knowledge on the stream such as its length and the number of distinct items in the stream. Second, we propose TwMinSwap-Is to find top-k recently frequent itemsets in data streams. We especially focus on keeping self-consistency of the discovered itemsets, which is the most important property for reliable results, while using O(k) memory space with the assumption of a constant itemset size. Through extensive experiments, we demonstrate that TwMinSwap outperforms all competitors in terms of accuracy and memory usage, with fast running time. We also show that TwMinSwap-Is is more accurate than the competitor and discovers recently frequent itemsets with reasonably large sizes (at most 5–7) depending on datasets. Thanks to TwMinSwap and TwMinSwap-Is, we report interesting discoveries in real world data streams, including the difference of trends between the winner and the loser of U.S. presidential candidates, and temporal human contact patterns.
引用
收藏
页码:391 / 422
页数:31
相关论文
共 50 条
  • [41] Frequent Pattern Mining with Uncertain Data
    Aggarwal, Charu C.
    Li, Yan
    Wang, Jianyong
    Wang, Jing
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 29 - 37
  • [42] Handling Dynamic Weights in Weighted Frequent Pattern Mining
    Ahmed, Chowdhury Farhan
    Tanbeer, Syed Khairuzzaman
    Jeong, Byeong-Soo
    Lee, Young-Koo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (11) : 2578 - 2588
  • [43] Approximately mining recently representative patterns on data streams
    Koh, Jia-Ling
    Don, Yuan-Bin
    EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2007, 4819 : 231 - 243
  • [44] Frequent Pattern Mining on Time and Location Aware Air Quality Data
    Aggarwa, Apeksha
    Toshniwal, Durga
    IEEE ACCESS, 2019, 7 : 98921 - 98933
  • [45] Frequent Itemsets Mining on Weighted Uncertain Data
    Alharbi, Manal
    Pathak, Sudipta
    Rajasekaran, Sanguthevar
    2014 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2014, : 201 - 206
  • [46] Pattern Discovery from Dynamic Data Streams using Frequent Pattern Mining with Multi-Support Thresholds
    Almuammar, Manal
    Fasli, Maria
    2017 INTERNATIONAL CONFERENCE ON THE FRONTIERS AND ADVANCES IN DATA SCIENCE (FADS), 2017, : 45 - 50
  • [47] Which Is Better for Frequent Pattern Mining: Approximate Counting or Sampling?
    Ng, Willie
    Dash, Manoranjan
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2009, 5691 : 151 - 162
  • [48] Efficient mining of frequent itemsets from data streams
    Leung, Carson Kai-Sang
    Brajczuk, Dale A.
    SHARING DATA, INFORMATION AND KNOWLEDGE, PROCEEDINGS, 2008, 5071 : 2 - 14
  • [49] An efficient algorithm for frequent itemset mining on data streams
    Xie Zhi-Jun
    Chen Hong
    Li, Cuiping
    ADVANCES IN DATA MINING: APPLICATIONS IN MEDICINE, WEB MINING, MARKETING, IMAGE AND SIGNAL MINING, 2006, 4065 : 474 - 491
  • [50] Anytime Frequent Itemset Mining of Transactional Data Streams
    Goyal, Poonam
    Challa, Jagat Sesh
    Shrivastava, Shivin
    Goyal, Navneet
    BIG DATA RESEARCH, 2020, 21