PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

被引:0
|
作者
Mao Yimin
Geng Junhao
Deborah Simon Mwakapesa
Yaser Ahangari Nanehkaran
Zhang Chi
Deng Xiaoheng
Chen Zhigang
机构
[1] Jiangxi University of Science and Technology,School of Information Engineering
[2] Central South University,School of Computer Science and Engineering
来源
Multimedia Systems | 2021年 / 27卷
关键词
DiffNodeset structure; MapReduce; 2-Way comparison strategy; Load balancing strategy based on dynamic grouping; Frequent item mining;
D O I
暂无
中图分类号
学科分类号
摘要
Frequent itemset mining (FIM) is a significant data mining technique which is widely adopted in numerous applications for exploring frequent items. With the rapid growth and expansion of datasets, FIM has become an interesting topic for many researchers, which has triggered many innovations of numerous FIM algorithms in the big data environment. This study aims to design an optimization parallel frequent itemset mining algorithm based on MapReduce, named as PFIMD\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{PFIMD}}$$\end{document} algorithm, to deal with the problem of time and space complexity during processing and computing item sets, as well as the failure to adequately balance the load among parallel tasks in the existing parallel FIM algorithms. First, a structure called DiffNodeset\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{DiffNodeset}}$$\end{document} is adopted for avoiding the increase of N-list\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N{-}list$$\end{document} cardinality in the MRPrePost\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{MRPrePost}}$$\end{document} algorithm effectively. Then, a 2-way comparison strategy is designed to speed up the DiffNodeset\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{DiffNodeset}}$$\end{document} generation of 2-itemsets and reduce the time complexity of the algorithm. Finally, the steps of the improved algorithm are parallelized using the cloud computing platform Hadoop and the programming model MapReduce. Moreover, to achieve a uniform grouping of each item in F-list\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F{-}list$$\end{document}, a load balancing strategy based on dynamic grouping is proposed, which solves the problem of uneven load of each node in the cluster. The experimental results show that the modified algorithm not only overcomes the shortcoming of MRPrePost\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{MRPrePost}}$$\end{document} in the big data environment, but also greatly reduces the time and space complexity. Finally, the specific applications of PFIMD\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{PFIMD}}$$\end{document} algorithm in several multimedia data sets are listed to illustrate its universality.
引用
收藏
页码:709 / 722
页数:13
相关论文
共 50 条
  • [41] A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data
    Xia, Dawen
    Lu, Xiaonan
    Li, Huaqing
    Wang, Wendong
    Li, Yantao
    Zhang, Zili
    COMPLEXITY, 2018,
  • [42] Frequent itemset mining-based spatial subclustering algorithm
    Wang, Qian
    Gao, Zhi-Peng
    Qiu, Xue-Song
    Wang, Xing-Bin
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2015, 38 : 20 - 23
  • [43] A Spark-based Incremental Algorithm for Frequent Itemset Mining
    Wen, Haoxing
    Li, Xiaoguang
    Kou, Mingdong
    Tou, Huaixiao
    He, Hengyi
    Yang, Yulu
    BDIOT 2018: PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON BIG DATA AND INTERNET OF THINGS, 2018, : 53 - 58
  • [44] A frequent itemset mining algorithm based on composite granular computing
    Wu, Hongjuan
    Liu, Yulu
    Yan, Pei
    Fang, Gang
    Zhong, Jing
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2018, 18 (01) : 247 - 257
  • [45] Improvement of Eclat Algorithm Based on Support in Frequent Itemset Mining
    Yu, Xiaomei
    Wang, Hong
    JOURNAL OF COMPUTERS, 2014, 9 (09) : 2116 - 2123
  • [46] A Heuristic Rule based Approximate Frequent Itemset Mining Algorithm
    Li, Haifeng
    Zhang, Yuejin
    Zhang, Ning
    Jia, Hengyue
    PROMOTING BUSINESS ANALYTICS AND QUANTITATIVE MANAGEMENT OF TECHNOLOGY: 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2016), 2016, 91 : 324 - 333
  • [47] MapReduce-Based Parallel Algorithm for Detecting and Resolving of Firewall Policy Conflict
    Xiao, Qi
    Qin, Yunchuan
    Li, Kenli
    HIGH PERFORMANCE COMPUTING, 2013, 207 : 118 - 131
  • [48] An Iterative MapReduce Based Frequent Subgraph Mining Algorithm
    Bhuiyan, Mansurul A.
    Al Hasan, Mohammad
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (03) : 608 - 620
  • [49] An efficient algorithm of frequent itemsets mining based on MapReduce
    Wang, Le
    Feng, Lin
    Zhang, Jing
    Liao, Pengyu
    Journal of Information and Computational Science, 2014, 11 (08): : 2809 - 2816
  • [50] An Improved Version of the Frequent Itemset Mining Algorithm
    Butincu, Cristian Nicolae
    Craus, Mitica
    2015 14TH ROEDUNET INTERNATIONAL CONFERENCE - NETWORKING IN EDUCATION AND RESEARCH (ROEDUNET NER), 2015, : 184 - 189