Frequent Itemsets Mining for Big Data: A Comparative Analysis

被引:29
|
作者
Apiletti, Daniele [1 ]
Baralis, Elena [1 ]
Cerquitelli, Tania [1 ]
Garza, Paolo [1 ]
Pulvirenti, Fabio [1 ]
Venturini, Luca [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, Turin, Italy
关键词
Big Data; Frequent itemset mining; Hadoop and Spark platforms;
D O I
10.1016/j.bdr.2017.06.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Itemset mining is a well-known exploratory data mining technique used to discover interesting correlations hidden in a data collection. Since it supports different targeted analyses, it is profitably exploited in a wide range of different domains, ranging from network traffic data to medical records. With the increasing amount of generated data, different scalable algorithms have been developed, exploiting the advantages of distributed computing frameworks, such as Apache Hadoop and Spark. This paper reviews Hadoop-and Spark-based scalable algorithms addressing the frequent itemset mining problem in the Big Data domain through both theoretical and experimental comparative analyses. Since the itemset mining task is computationally expensive, its distribution and parallelization strategies heavily affect memory usage, load balancing, and communication costs. A detailed discussion of the algorithmic choices of the distributed methods for frequent itemset mining is followed by an experimental analysis comparing the performance of state-of-the-art distributed implementations on both synthetic and real datasets. The strengths and weaknesses of the algorithms are thoroughly discussed with respect to the dataset features (e.g., data distribution, average transaction length, number of records), and specific parameter settings. Finally, based on theoretical and experimental analyses, open research directions for the parallelization of the itemset mining problem are presented. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:67 / 83
页数:17
相关论文
共 50 条
  • [41] Towards a new approach for mining frequent itemsets on data stream
    Chedy Raïssi
    Pascal Poncelet
    Maguelonne Teisseire
    Journal of Intelligent Information Systems, 2007, 28 : 23 - 36
  • [42] Mining frequent itemsets in data streams within a time horizon
    Troiano, Luigi
    Scibelli, Giacomo
    DATA & KNOWLEDGE ENGINEERING, 2014, 89 : 21 - 37
  • [43] Performance Evaluation of Methods for Mining Frequent Itemsets on Temporal Data
    Tripathi, Tripti
    Yadav, Divakar
    SECOND INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND COMMUNICATION TECHNOLOGIES, ICCNCT 2019, 2020, 44 : 910 - 917
  • [44] Frequent Itemsets Mining in Data Streams Using Reconfigurable Hardware
    Bustio, Lazaro
    Cumplido, Rene
    Hernandez, Raudel
    Bande, Jose M.
    Feregrino, Claudia
    NEW FRONTIERS IN MINING COMPLEX PATTERNS, 2016, 9607 : 32 - 45
  • [45] New Policy of Maximal Frequent Itemsets in Data Stream Mining
    Xu, ChongHuan
    Ju, ChunHua
    ADVANCED MECHANICAL ENGINEERING, PTS 1 AND 2, 2010, 26-28 : 118 - +
  • [46] A novel approach for data stream maximal frequent itemsets mining
    Xu C.-H.
    Xu, Chong-Huan (talentxch@163.com), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (10): : 224 - 231
  • [47] Efficient mining algorithm of frequent itemsets for uncertain data streams
    Wang Qianqian
    Liu Fang-ai
    PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 443 - 446
  • [48] Mining Frequent Closed Itemsets in Large High Dimensional Data
    余光柱
    曾宪辉
    邵世煌
    Journal of Donghua University(English Edition), 2008, 25 (04) : 416 - 424
  • [49] Mining of Probabilistic Frequent Itemsets over Uncertain Data Streams
    Liu Lixin
    Zhang Xiaolin
    Zhang Huanxiang
    2014 11TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2014, : 231 - 237
  • [50] An Efficient Algorithm for Mining Closed Frequent Itemsets in Data Streams
    Ao, Fujiang
    Du, Jing
    Yan, Yuejin
    Liu, Baohong
    Huang, Kedi
    8TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY WORKSHOPS: CIT WORKSHOPS 2008, PROCEEDINGS, 2008, : 37 - +