Frequent Itemsets Mining for Big Data: A Comparative Analysis

被引:29
|
作者
Apiletti, Daniele [1 ]
Baralis, Elena [1 ]
Cerquitelli, Tania [1 ]
Garza, Paolo [1 ]
Pulvirenti, Fabio [1 ]
Venturini, Luca [1 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, Turin, Italy
关键词
Big Data; Frequent itemset mining; Hadoop and Spark platforms;
D O I
10.1016/j.bdr.2017.06.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Itemset mining is a well-known exploratory data mining technique used to discover interesting correlations hidden in a data collection. Since it supports different targeted analyses, it is profitably exploited in a wide range of different domains, ranging from network traffic data to medical records. With the increasing amount of generated data, different scalable algorithms have been developed, exploiting the advantages of distributed computing frameworks, such as Apache Hadoop and Spark. This paper reviews Hadoop-and Spark-based scalable algorithms addressing the frequent itemset mining problem in the Big Data domain through both theoretical and experimental comparative analyses. Since the itemset mining task is computationally expensive, its distribution and parallelization strategies heavily affect memory usage, load balancing, and communication costs. A detailed discussion of the algorithmic choices of the distributed methods for frequent itemset mining is followed by an experimental analysis comparing the performance of state-of-the-art distributed implementations on both synthetic and real datasets. The strengths and weaknesses of the algorithms are thoroughly discussed with respect to the dataset features (e.g., data distribution, average transaction length, number of records), and specific parameter settings. Finally, based on theoretical and experimental analyses, open research directions for the parallelization of the itemset mining problem are presented. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:67 / 83
页数:17
相关论文
共 50 条
  • [1] A Comparative Analysis of Algorithms for Mining Frequent Itemsets
    Busarov, Vyacheslav
    Grafeeva, Natalia
    Mikhailova, Elena
    DATABASES AND INFORMATION SYSTEMS, DB&IS 2016, 2016, 615 : 136 - 150
  • [2] Scalable Vertical Mining for Big Data Analytics of Frequent Itemsets
    Leung, Carson K.
    Zhang, Hao
    Souza, Joglas
    Lee, Wookey
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2018, PT I, 2018, 11029 : 3 - 17
  • [3] The Algorithm for Mining Global Frequent Itemsets based on Big Data
    Bo, He
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON LOGISTICS, ENGINEERING, MANAGEMENT AND COMPUTER SCIENCE (LEMCS 2015), 2015, 117 : 158 - 161
  • [4] A New Approximate Method For Mining Frequent Itemsets From Big Data *
    Valiullin, Timur
    Huang, Zhexue
    Wei, Chenghao
    Yin, Jianfei
    Wu, Dingming
    Egorova, Iuliia
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2021, 18 (03) : 641 - 656
  • [5] Research on Mining Global Maximal Frequent Itemsets for Health Big Data
    He, Bo
    Pei, Jianhui
    2017 IEEE 3RD INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC), 2017, : 1143 - 1146
  • [6] Mining Frequent Itemsets from Online Data Streams: Comparative Study
    Nabil, HebaTallah Mohamed
    Eldin, Ahmed Sharaf
    Belal, Mohamed Abd El-Fattah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2013, 4 (07) : 117 - 125
  • [7] Data Projection Effects in Frequent Itemsets Mining
    Yakop, Mohammad Arsyad Mohd
    Mutalib, Sofianita
    Abdul-Rahman, Shuzlina
    SOFT COMPUTING IN DATA SCIENCE, SCDS 2015, 2015, 545 : 23 - 32
  • [8] Mining frequent itemsets from uncertain data
    Chui, Chun-Kit
    Kao, Ben
    Hung, Edward
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 47 - +
  • [9] Mining maximal frequent itemsets in uncertain data
    Tang, Xianghong
    Yang, Quanwei
    Zheng, Yang
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2015, 43 (09): : 29 - 34
  • [10] Frequent Itemsets Mining on Weighted Uncertain Data
    Alharbi, Manal
    Pathak, Sudipta
    Rajasekaran, Sanguthevar
    2014 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2014, : 201 - 206