PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining

被引:0
|
作者
Mao Yimin
Geng Junhao
Deborah Simon Mwakapesa
Yaser Ahangari Nanehkaran
Zhang Chi
Deng Xiaoheng
Chen Zhigang
机构
[1] Jiangxi University of Science and Technology,School of Information Engineering
[2] Central South University,School of Computer Science and Engineering
来源
Multimedia Systems | 2021年 / 27卷
关键词
DiffNodeset structure; MapReduce; 2-Way comparison strategy; Load balancing strategy based on dynamic grouping; Frequent item mining;
D O I
暂无
中图分类号
学科分类号
摘要
Frequent itemset mining (FIM) is a significant data mining technique which is widely adopted in numerous applications for exploring frequent items. With the rapid growth and expansion of datasets, FIM has become an interesting topic for many researchers, which has triggered many innovations of numerous FIM algorithms in the big data environment. This study aims to design an optimization parallel frequent itemset mining algorithm based on MapReduce, named as PFIMD\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{PFIMD}}$$\end{document} algorithm, to deal with the problem of time and space complexity during processing and computing item sets, as well as the failure to adequately balance the load among parallel tasks in the existing parallel FIM algorithms. First, a structure called DiffNodeset\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{DiffNodeset}}$$\end{document} is adopted for avoiding the increase of N-list\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N{-}list$$\end{document} cardinality in the MRPrePost\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{MRPrePost}}$$\end{document} algorithm effectively. Then, a 2-way comparison strategy is designed to speed up the DiffNodeset\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{DiffNodeset}}$$\end{document} generation of 2-itemsets and reduce the time complexity of the algorithm. Finally, the steps of the improved algorithm are parallelized using the cloud computing platform Hadoop and the programming model MapReduce. Moreover, to achieve a uniform grouping of each item in F-list\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F{-}list$$\end{document}, a load balancing strategy based on dynamic grouping is proposed, which solves the problem of uneven load of each node in the cluster. The experimental results show that the modified algorithm not only overcomes the shortcoming of MRPrePost\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{MRPrePost}}$$\end{document} in the big data environment, but also greatly reduces the time and space complexity. Finally, the specific applications of PFIMD\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{PFIMD}}$$\end{document} algorithm in several multimedia data sets are listed to illustrate its universality.
引用
收藏
页码:709 / 722
页数:13
相关论文
共 50 条
  • [31] A parallel algorithm for mining constrained frequent patterns using MapReduce
    Xiaowu Yan
    Jifu Zhang
    Yaling Xun
    Xiao Qin
    Soft Computing, 2017, 21 : 2237 - 2249
  • [32] Paradigm and performance analysis of distributed frequent itemset mining algorithms based on Mapreduce
    Xiao, Wen
    Hu, Juan
    MICROPROCESSORS AND MICROSYSTEMS, 2021, 82
  • [33] An efficient frequent itemset mining algorithm
    Luo, Ke
    Zhang, Xue-Mao
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 756 - 761
  • [34] A MapReduce-Based User Identification Algorithm in Web Usage Mining
    Srivastava, Mitali
    Garg, Rakhi
    Mishra, P. K.
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND WEB ENGINEERING, 2018, 13 (02) : 11 - 23
  • [35] HDFS Framework for Efficient Frequent Itemset Mining Using MapReduce
    Kulkarni, Prajakta G.
    Khonde, Shraddha R.
    2017 1ST INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND INFORMATION MANAGEMENT (ICISIM), 2017, : 171 - 178
  • [36] MrFIM: A MapReduce Approach for Frequent Itemset Mining in Big Data
    Rahman, Abdul
    Manjaramkar, Arati
    2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [37] Sequence-Growth : A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework
    Liang, Yen-hui
    Wu, Shiow-yang
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 393 - 400
  • [38] PaWI: Parallel Weighted Itemset Mining by means of MapReduce
    Baralis, Elena
    Cagliero, Luca
    Garza, Paolo
    Grimaudo, Luigi
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 25 - 32
  • [39] Parallel Analytical Model for Frequent Itemset Mining
    Poorva, K.
    Anushree, H. K.
    Mahesha, K., V
    Pavithra, T. R.
    Vinutha, D. C.
    Chandini, S. B.
    2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 517 - 519
  • [40] Parallel Frequent Itemset Mining on Streaming Data
    He, Yanshan
    Yue, Min
    2014 10TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2014, : 725 - 730