PaMPa-HD: a Parallel MapReduce-based frequent Pattern miner for High-Dimensional data

被引:13
|
作者
Apiletti, Daniele [1 ]
Baralis, Elena [1 ]
Cerquitelli, Tania [1 ]
Garza, Paolo [1 ]
Pulvirenti, Fabio [1 ]
Michiardi, Pietro [2 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, Turin, Italy
[2] Eurecom, Sophia Antipolis, France
关键词
D O I
10.1109/ICDMW.2015.18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequent closed itemset mining is among the most complex exploratory techniques in data mining, and provides the ability to discover hidden correlations in transactional datasets. The explosion of Big Data is leading to new parallel and distributed approaches. Unfortunately, most of them are designed to cope with low-dimensional datasets, whereas no distributed high-dimensional frequent closed itemset mining algorithms exists. This work introduces PaMPa-HD, a parallel MapReduce-based frequent closed itemset mining algorithm for high-dimensional datasets, based on Carpenter. The experimental results, performed on both real and synthetic datasets, show the efficiency and scalability of PaMPa-HD.
引用
收藏
页码:839 / 846
页数:8
相关论文
共 50 条
  • [41] Pattern alternating maximization algorithm for missing data in high-dimensional problems
    Städler, Nicolas
    Stekhoven, Daniel J.
    Bühlmann, Peter
    Journal of Machine Learning Research, 2014, 15 : 1903 - 1928
  • [42] Interactive Pattern Discovery in High-Dimensional, Multimodal Data Using Manifolds
    Guo, Jinhong K.
    Hofmann, Martin O.
    COMPLEX ADAPTIVE SYSTEMS CONFERENCE WITH THEME: ENGINEERING CYBER PHYSICAL SYSTEMS, CAS, 2017, 114 : 258 - 265
  • [43] Spark based Parallel Frequent Pattern Rules for Social Media Data Analytics
    Chaturvedi, Shubhangi
    Saritha, Sri Khetwat
    Chaturvedi, Animesh
    2023 IEEE/ACM 23RD INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING WORKSHOPS, CCGRIDW, 2023, : 168 - 175
  • [44] Differentially Private Top-k Frequent Columns Publication for High-Dimensional Data
    Wang, Ning
    Wang, Zhigang
    Gu, Yu
    Xu, Jia
    Wei, Zhiqiang
    Yu, Ge
    IEEE ACCESS, 2019, 7 : 177342 - 177353
  • [46] Data-pattern discovery methods for detection in nongaussian high-dimensional data sets
    Levasseur, Cecile
    Kreutz-Delgado, Kenneth
    Mayer, Uwe
    Gancarz, Gregory
    2005 39th Asilomar Conference on Signals, Systems and Computers, Vols 1 and 2, 2005, : 545 - 549
  • [47] High-dimensional data express model based on tensor
    Jing, Zhang
    XinChang, Guo
    Acta Technica CSAV (Ceskoslovensk Akademie Ved), 2017, 62 (01): : 381 - 389
  • [48] High-dimensional Data Dimension Reduction Based on KECA
    Hu, Yongde
    Pan, Jingchang
    Tan, Xin
    SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 1101 - 1104
  • [49] Clustering algorithm of high-dimensional data based on units
    School of In formation Engineering, Hubei Institute for Nationalities, Enshi 445000, China
    Jisuanji Yanjiu yu Fazhan, 2007, 9 (1618-1623): : 1618 - 1623
  • [50] A hyperplane based indexing technique for high-dimensional data
    Wang, Guoren
    Zhou, Xiangmin
    Wang, Bin
    Qiao, Baiyou
    Han, Donghong
    INFORMATION SCIENCES, 2007, 177 (11) : 2255 - 2268