PaMPa-HD: a Parallel MapReduce-based frequent Pattern miner for High-Dimensional data

被引：13

作者：

Apiletti, Daniele ^{[1
]}

Baralis, Elena ^{[1
]}

Cerquitelli, Tania ^{[1
]}

Garza, Paolo ^{[1
]}

Pulvirenti, Fabio ^{[1
]}

Michiardi, Pietro ^{[2
]}

机构：

[1] Politecn Torino, Dipartimento Automat & Informat, Turin, Italy

[2] Eurecom, Sophia Antipolis, France

来源：

2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW) | 2015年

关键词：

D O I：

10.1109/ICDMW.2015.18

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Frequent closed itemset mining is among the most complex exploratory techniques in data mining, and provides the ability to discover hidden correlations in transactional datasets. The explosion of Big Data is leading to new parallel and distributed approaches. Unfortunately, most of them are designed to cope with low-dimensional datasets, whereas no distributed high-dimensional frequent closed itemset mining algorithms exists. This work introduces PaMPa-HD, a parallel MapReduce-based frequent closed itemset mining algorithm for high-dimensional datasets, based on Carpenter. The experimental results, performed on both real and synthetic datasets, show the efficiency and scalability of PaMPa-HD.

引用

页码：839 / 846

页数：8

共 50 条

[21] HD-eye: Visual mining of high-dimensional data
Hinneburg, A
Keim, DA
Wawryniuk, M
IEEE COMPUTER GRAPHICS AND APPLICATIONS, 1999, 19 (05) : 22 - 31
[22] A MapReduce-based parallel K-means clustering for large-scale CIM data verification
Deng, Chuang
Liu, Yang
Xu, Lixiong
Yang, Jie
Liu, Junyong
Li, Siguang
Li, Maozhen
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (11): : 3096 - 3114
[23] Online Pattern Mining for High-Dimensional Data Streams
Yamamoto, Yoshitaka
Iwanuma, Koji
PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2880 - 2882
[24] Performance Evaluation of the MapReduce-based Parallel Data Preprocessing Algorithm in Web Usage Mining with Robot Detection Approaches
Srivastava, Mitali
Srivastava, Atul Kumar
Garg, Rakhi
Mishra, P. K.
IETE TECHNICAL REVIEW, 2022, 39 (04) : 865 - 879
[25] Parallel Clustering of High-Dimensional Social Media Data Streams
Gao, Xiaoming
Ferrara, Emilio
Qiu, Judy
2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 323 - 332
[26] Efficient Parallel Skyline Query Processing for High-Dimensional Data
Tang, Mingjie
Yu, Yongyang
Aref, Walid G.
Malluhi, Qutaibah M.
Ouzzani, Mourad
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2113 - 2114
[27] A parallel approach for high utility-based frequent pattern mining in a big data environment
Krishna Kumar Mohbey
Sunil Kumar
Iran Journal of Computer Science, 2021, 4 (3) : 195 - 200
[28] kNN Join for Dynamic High-Dimensional Data: A Parallel Approach
Ukey, Nimish
Yang, Zhengyi
Yang, Wenke
Li, Binghao
Li, Runze
DATABASES THEORY AND APPLICATIONS, ADC 2023, 2024, 14386 : 3 - 16
[29] Efficient Parallel Skyline Query Processing for High-Dimensional Data
Tang, Mingjie
Yu, Yongyang
Aref, Walid G.
Malluhi, Qutaibah M.
Ouzzani, Mourad
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (10) : 1838 - 1851
[30] Comparing MapReduce-Basedk-NN Similarity Joins on Hadoop for High-Dimensional Data
Cech, Premysl
Marousek, Jakub
Lokoc, Jakub
Silva, Yasin N.
Starks, Jeremy
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2017, 2017, 10604 : 63 - 75

← 1 2 3 4 5 →