A survey on algorithms for mining frequent itemsets over data streams

被引:0
|
作者
James Cheng
Yiping Ke
Wilfred Ng
机构
[1] The Hong Kong University of Science and Technology,Department of Computer Science and Engineering
[2] HKUST,undefined
来源
关键词
Frequent itemsets; Stream mining; Window models; Approximate algorithms;
D O I
暂无
中图分类号
学科分类号
摘要
The increasing prominence of data streams arising in a wide range of advanced applications such as fraud detection and trend learning has led to the study of online mining of frequent itemsets (FIs). Unlike mining static databases, mining data streams poses many new challenges. In addition to the one-scan nature, the unbounded memory requirement and the high data arrival rate of data streams, the combinatorial explosion of itemsets exacerbates the mining task. The high complexity of the FI mining problem hinders the application of the stream mining techniques. We recognize that a critical review of existing techniques is needed in order to design and develop efficient mining algorithms and data structures that are able to match the processing rate of the mining with the high arrival rate of data streams. Within a unifying set of notations and terminologies, we describe in this paper the efforts and main techniques for mining data streams and present a comprehensive survey of a number of the state-of-the-art algorithms on mining frequent itemsets over data streams. We classify the stream-mining techniques into two categories based on the window model that they adopt in order to provide insights into how and why the techniques are useful. Then, we further analyze the algorithms according to whether they are exact or approximate and, for approximate approaches, whether they are false-positive or false-negative. We also discuss various interesting issues, including the merits and limitations in existing research and substantive areas for future research.
引用
收藏
页码:1 / 27
页数:26
相关论文
共 50 条
  • [31] Mining Approximate Frequent Itemsets over Data Streams Using Window Sliding Techniques
    Kim, Younghee
    Park, Eunkyoung
    Kim, Ungmo
    DATABASE THEORY AND APPLICATION, 2009, 64 : 49 - 56
  • [32] Finding frequent itemsets over online data streams
    Chang, Joong Hyuk
    Lee, Won Suk
    INFORMATION AND SOFTWARE TECHNOLOGY, 2006, 48 (07) : 606 - 618
  • [33] A Novel Strategy for Mining Frequent Closed Itemsets in Data Streams
    Tang, Keming
    Dai, Caiyan
    Chen, Ling
    JOURNAL OF COMPUTERS, 2012, 7 (07) : 1564 - 1573
  • [34] Mining frequent itemsets in data streams within a time horizon
    Troiano, Luigi
    Scibelli, Giacomo
    DATA & KNOWLEDGE ENGINEERING, 2014, 89 : 21 - 37
  • [35] Frequent Itemsets Mining in Data Streams Using Reconfigurable Hardware
    Bustio, Lazaro
    Cumplido, Rene
    Hernandez, Raudel
    Bande, Jose M.
    Feregrino, Claudia
    NEW FRONTIERS IN MINING COMPLEX PATTERNS, 2016, 9607 : 32 - 45
  • [36] Efficient mining algorithm of frequent itemsets for uncertain data streams
    Wang Qianqian
    Liu Fang-ai
    PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 443 - 446
  • [37] An Efficient Algorithm for Mining Closed Frequent Itemsets in Data Streams
    Ao, Fujiang
    Du, Jing
    Yan, Yuejin
    Liu, Baohong
    Huang, Kedi
    8TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY WORKSHOPS: CIT WORKSHOPS 2008, PROCEEDINGS, 2008, : 37 - +
  • [38] Mining Frequent Itemsets with Normalized Weight in Continuous Data Streams
    Kim, Younghee
    Kim, Wonyoung
    Kim, Ungmo
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2010, 6 (01): : 79 - 90
  • [39] Uncertain Frequent Itemsets Mining Algorithm on Data Streams with Constraints
    Yu, Qun
    Tang, Ke-Ming
    Tang, Shi-Xi
    Lv, Xin
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 192 - 201
  • [40] Mining Frequent Itemsets in Data Streams Based on Genetic Algorithm
    Han, Chong
    Sun, Lijuan
    Guo, Jian
    Chen, Xiaodong
    2013 15TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2013, : 748 - 753