Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework

被引:50
|
作者
Wu, Jimmy Ming-Tai [1 ]
Srivastava, Gautam [2 ,3 ]
Wei, Min [1 ]
Yun, Unil [4 ]
Lin, Jerry Chun-Wei [5 ]
机构
[1] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao, Peoples R China
[2] Brandon Univ, Dept Math & Comp Sci, 270 18th St, Brandon, MB R7A 6A9, Canada
[3] China Med Univ, Res Ctr Interneural Comp, Taichung 40402, Taiwan
[4] Sejong Univ, Dept Comp Engn, Seoul, South Korea
[5] Western Norway Univ Appl Sci, Dept Comp Sci Elect Engn & Math Sci, Bergen, Norway
关键词
Hadoop; High fuzzy utility pattern; High utility itemset mining; Big-data; Fuzzy-set theory; MapReduce; ITEMSETS; ALGORITHM; STRATEGY;
D O I
10.1016/j.ins.2020.12.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the past decade, high-utility itemset mining (HUIM) has received widespread attention that can emphasize more critical information than was previously possible using frequent itemset mining (FIM). Unfortunately, HUIM is very similar to FIM since the methodology determines itemsets using a binary model based on a pre-defined minimum utility threshold. Additionally, most previous works only focused on single, small datasets in HUIM, which is not realistic to any real-world scenarios today containing big data environments. In this work, the fuzzy-set theory and a MapReduce framework are both utilized to design a novel high fuzzy utility pattern mining algorithm to resolve the above issues. Fuzzy-set theory is first involved and a new algorithm called efficient high fuzzy utility itemset mining (EFUPM) is designed to discover high fuzzy utility patterns from a single machine. Two upper-bounds are then estimated to allow early pruning of unpromising candidates in the search space. To handle the large-scale of big datasets, a Hadoop-based high fuzzy utility pattern mining (HFUPM) algorithm is then developed to discover high fuzzy utility patterns based on the Hadoop framework. Experimental results clearly show that the proposed algorithms perform strongly to mine the required high fuzzy utility patterns whether in a single machine or a large-scale environment compared to the current state-of-the-art approaches. (C) 2020 The Author(s). Published by Elsevier Inc.
引用
收藏
页码:31 / 48
页数:18
相关论文
共 50 条
  • [1] Fuzzy High-Utility Pattern Mining based on the Hadoop Framework
    Wu, Jimmy Ming-Tai
    Srivastava, Gautam
    Wei, Min
    Lin, Jerry Chun-Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5413 - 5420
  • [2] High-Utility Pattern Mining in Hadoop Environments
    Wu, Jimmy Ming-Tai
    Wei, Min
    Srivastava, Gautam
    Lin, Jerry Chun-Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5421 - 5427
  • [3] A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop
    Cheng Z.
    Shen W.
    Fang W.
    Lin J.C.-W.
    Complex System Modeling and Simulation, 2023, 3 (01): : 47 - 58
  • [4] High-utility pattern mining: A method for discovery of high-utility item sets
    Hu, Jianying
    Mojsilovic, Aleksandra
    PATTERN RECOGNITION, 2007, 40 (11) : 3317 - 3324
  • [5] Distributed and Parallel High Utility Sequential Pattern Mining
    Zihayat, Morteza
    Hu, Zane Zhenhua
    An, Aijun
    Hu, Yonggang
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 853 - 862
  • [6] HUOPM: High-Utility Occupancy Pattern Mining
    Gan, Wensheng
    Lin, Jerry Chun-Wei
    Fournier-Viger, Philippe
    Chao, Han-Chieh
    Yu, Philip S.
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (03) : 1195 - 1208
  • [7] A pure array structure and parallel strategy for high-utility sequential pattern mining
    Bac Le
    Ut Huynh
    Duy-Tai Dinh
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 104 : 107 - 120
  • [8] Improved Strategy for High-Utility Pattern Mining Algorithm
    Wang, Le
    Wang, Shui
    Li, Haiyan
    Zhou, Chunliang
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [9] Distributed Algorithm for High-Utility Subgraph Pattern Mining over Big Data Platforms
    Khare, Alind
    Goyal, Vikram
    Baride, Srikanth
    Prasad, Sushil K.
    McDermott, Michael
    Shah, Dhara
    2017 IEEE 24TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2017, : 263 - 272
  • [10] High-utility sequential pattern mining in incremental database
    Yan, Huizhen
    Li, Fengyang
    Hsieh, Ming-Chia
    Wu, Jimmy Ming-Tai
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):