PaMPa-HD: a Parallel MapReduce-based frequent Pattern miner for High-Dimensional data

被引:13
|
作者
Apiletti, Daniele [1 ]
Baralis, Elena [1 ]
Cerquitelli, Tania [1 ]
Garza, Paolo [1 ]
Pulvirenti, Fabio [1 ]
Michiardi, Pietro [2 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, Turin, Italy
[2] Eurecom, Sophia Antipolis, France
关键词
D O I
10.1109/ICDMW.2015.18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequent closed itemset mining is among the most complex exploratory techniques in data mining, and provides the ability to discover hidden correlations in transactional datasets. The explosion of Big Data is leading to new parallel and distributed approaches. Unfortunately, most of them are designed to cope with low-dimensional datasets, whereas no distributed high-dimensional frequent closed itemset mining algorithms exists. This work introduces PaMPa-HD, a parallel MapReduce-based frequent closed itemset mining algorithm for high-dimensional datasets, based on Carpenter. The experimental results, performed on both real and synthetic datasets, show the efficiency and scalability of PaMPa-HD.
引用
收藏
页码:839 / 846
页数:8
相关论文
共 50 条
  • [31] A Parallel MapReduce Algorithm to Efficiently Support Itemset Mining on High Dimensional Data
    Apiletti, Daniele
    Baralis, Elena
    Cerquitelli, Tania
    Garza, Paolo
    Pulvirenti, Fabio
    Michiardi, Pietro
    BIG DATA RESEARCH, 2017, 10 : 53 - 69
  • [32] PLASMA-HD: Probing the LAttice Structure and MAkeup of High-dimensional Data
    Fuhry, David
    Zhang, Yang
    Satuluri, Venu
    Nandi, Arnab
    Parthasarathy, Srinivasan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (12): : 1318 - 1321
  • [33] Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data
    Kiraly, Andras
    Gyenesei, Attila
    Abonyi, Janos
    SCIENTIFIC WORLD JOURNAL, 2014,
  • [34] PATTERN-RECOGNITION OF MULTIVARIATE ANALYTICAL DATA BY PLOTS OF HIGH-DIMENSIONAL DATA
    GEURTS, FL
    KATEMAN, G
    ANALYTICA CHIMICA ACTA, 1985, 176 (OCT) : 253 - 257
  • [35] The High-Activity Parallel Implementation of Data Preprocessing Based on MapReduce
    He, Qing
    Tan, Qing
    Ma, Xudong
    Shi, Zhongzhi
    ROUGH SET AND KNOWLEDGE TECHNOLOGY (RSKT), 2010, 6401 : 646 - 654
  • [36] A Parallel Coordinates Plot Method Based on Unsupervised Feature Selection for High-Dimensional Data Visualization
    Lou, Jiaqi
    Dong, Ke
    Wang, Maosen
    IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, : 532 - 536
  • [37] Parallel high-dimensional index structure using cell-based filtering for multimedia data
    Chang, Jae-Woo
    Kim, Yong-Ki
    Kim, Young-Jin
    FRONTIERS OF HIGH PERFORMANCE COMPUTING AND NETWORKING - ISPA 2006 WORKSHOPS, PROCEEDINGS, 2006, 4331 : 781 - +
  • [38] PALLADIO: a parallel framework for robust variable selection in high-dimensional data
    Barbieri, Matteo
    Fiorini, Samuele
    Tomasi, Federico
    Barla, Annalisa
    PROCEEDINGS OF PYHPC2016: 6TH WORKSHOP ON PYTHON FOR HIGH-PERFORMANCE AND SCIENTIFIC COMPUTING, 2016, : 19 - 26
  • [39] SCEA: A Parallel Clustering Ensemble Algorithm for High-Dimensional Massive Data
    Liao, Bin
    Huang, Jing-Lai
    Wang, Xin
    Sun, Rui-Na
    Ge, Xiao-Yan
    Guo, Bing-Lei
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2021, 49 (06): : 1077 - 1087
  • [40] Pattern Alternating Maximization Algorithm for Missing Data in High-Dimensional Problems
    Stadler, Nicolas
    Stekhoven, Daniel J.
    Buehlmann, Peter
    JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 1903 - 1928