PaMPa-HD: a Parallel MapReduce-based frequent Pattern miner for High-Dimensional data

被引:13
|
作者
Apiletti, Daniele [1 ]
Baralis, Elena [1 ]
Cerquitelli, Tania [1 ]
Garza, Paolo [1 ]
Pulvirenti, Fabio [1 ]
Michiardi, Pietro [2 ]
机构
[1] Politecn Torino, Dipartimento Automat & Informat, Turin, Italy
[2] Eurecom, Sophia Antipolis, France
关键词
D O I
10.1109/ICDMW.2015.18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequent closed itemset mining is among the most complex exploratory techniques in data mining, and provides the ability to discover hidden correlations in transactional datasets. The explosion of Big Data is leading to new parallel and distributed approaches. Unfortunately, most of them are designed to cope with low-dimensional datasets, whereas no distributed high-dimensional frequent closed itemset mining algorithms exists. This work introduces PaMPa-HD, a parallel MapReduce-based frequent closed itemset mining algorithm for high-dimensional datasets, based on Carpenter. The experimental results, performed on both real and synthetic datasets, show the efficiency and scalability of PaMPa-HD.
引用
收藏
页码:839 / 846
页数:8
相关论文
共 50 条
  • [21] HD-eye: Visual mining of high-dimensional data
    Hinneburg, A
    Keim, DA
    Wawryniuk, M
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 1999, 19 (05) : 22 - 31
  • [22] A MapReduce-based parallel K-means clustering for large-scale CIM data verification
    Deng, Chuang
    Liu, Yang
    Xu, Lixiong
    Yang, Jie
    Liu, Junyong
    Li, Siguang
    Li, Maozhen
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (11): : 3096 - 3114
  • [23] Online Pattern Mining for High-Dimensional Data Streams
    Yamamoto, Yoshitaka
    Iwanuma, Koji
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2880 - 2882
  • [24] Performance Evaluation of the MapReduce-based Parallel Data Preprocessing Algorithm in Web Usage Mining with Robot Detection Approaches
    Srivastava, Mitali
    Srivastava, Atul Kumar
    Garg, Rakhi
    Mishra, P. K.
    IETE TECHNICAL REVIEW, 2022, 39 (04) : 865 - 879
  • [25] Parallel Clustering of High-Dimensional Social Media Data Streams
    Gao, Xiaoming
    Ferrara, Emilio
    Qiu, Judy
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 323 - 332
  • [26] Efficient Parallel Skyline Query Processing for High-Dimensional Data
    Tang, Mingjie
    Yu, Yongyang
    Aref, Walid G.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2113 - 2114
  • [27] A parallel approach for high utility-based frequent pattern mining in a big data environment
    Krishna Kumar Mohbey
    Sunil Kumar
    Iran Journal of Computer Science, 2021, 4 (3) : 195 - 200
  • [28] kNN Join for Dynamic High-Dimensional Data: A Parallel Approach
    Ukey, Nimish
    Yang, Zhengyi
    Yang, Wenke
    Li, Binghao
    Li, Runze
    DATABASES THEORY AND APPLICATIONS, ADC 2023, 2024, 14386 : 3 - 16
  • [29] Efficient Parallel Skyline Query Processing for High-Dimensional Data
    Tang, Mingjie
    Yu, Yongyang
    Aref, Walid G.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (10) : 1838 - 1851
  • [30] Comparing MapReduce-Basedk-NN Similarity Joins on Hadoop for High-Dimensional Data
    Cech, Premysl
    Marousek, Jakub
    Lokoc, Jakub
    Silva, Yasin N.
    Starks, Jeremy
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2017, 2017, 10604 : 63 - 75