Fuzzy high-utility pattern mining in parallel and distributed Hadoop framework

被引:50
|
作者
Wu, Jimmy Ming-Tai [1 ]
Srivastava, Gautam [2 ,3 ]
Wei, Min [1 ]
Yun, Unil [4 ]
Lin, Jerry Chun-Wei [5 ]
机构
[1] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao, Peoples R China
[2] Brandon Univ, Dept Math & Comp Sci, 270 18th St, Brandon, MB R7A 6A9, Canada
[3] China Med Univ, Res Ctr Interneural Comp, Taichung 40402, Taiwan
[4] Sejong Univ, Dept Comp Engn, Seoul, South Korea
[5] Western Norway Univ Appl Sci, Dept Comp Sci Elect Engn & Math Sci, Bergen, Norway
关键词
Hadoop; High fuzzy utility pattern; High utility itemset mining; Big-data; Fuzzy-set theory; MapReduce; ITEMSETS; ALGORITHM; STRATEGY;
D O I
10.1016/j.ins.2020.12.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the past decade, high-utility itemset mining (HUIM) has received widespread attention that can emphasize more critical information than was previously possible using frequent itemset mining (FIM). Unfortunately, HUIM is very similar to FIM since the methodology determines itemsets using a binary model based on a pre-defined minimum utility threshold. Additionally, most previous works only focused on single, small datasets in HUIM, which is not realistic to any real-world scenarios today containing big data environments. In this work, the fuzzy-set theory and a MapReduce framework are both utilized to design a novel high fuzzy utility pattern mining algorithm to resolve the above issues. Fuzzy-set theory is first involved and a new algorithm called efficient high fuzzy utility itemset mining (EFUPM) is designed to discover high fuzzy utility patterns from a single machine. Two upper-bounds are then estimated to allow early pruning of unpromising candidates in the search space. To handle the large-scale of big datasets, a Hadoop-based high fuzzy utility pattern mining (HFUPM) algorithm is then developed to discover high fuzzy utility patterns based on the Hadoop framework. Experimental results clearly show that the proposed algorithms perform strongly to mine the required high fuzzy utility patterns whether in a single machine or a large-scale environment compared to the current state-of-the-art approaches. (C) 2020 The Author(s). Published by Elsevier Inc.
引用
收藏
页码:31 / 48
页数:18
相关论文
共 50 条
  • [41] Efficient Mining of Uncertain Data for High-Utility Itemsets
    Lin, Jerry Chun-Wei
    Gan, Wensheng
    Fournier-Viger, Philippe
    Hong, Tzung-Pei
    Tseng, Vincent S.
    WEB-AGE INFORMATION MANAGEMENT, PT I, 2016, 9658 : 17 - 30
  • [42] Ignoring Internal Utilities in High-Utility Itemset Mining
    Oguz, Damla
    SYMMETRY-BASEL, 2022, 14 (11):
  • [43] Mining high-utility itemsets in dynamic profit databases
    Nguyen, Loan T. T.
    Phuc Nguyen
    Nguyen, Trinh D. D.
    Vo, Bay
    Fournier-Viger, Philippe
    Tseng, Vincent S.
    KNOWLEDGE-BASED SYSTEMS, 2019, 175 : 130 - 144
  • [44] Mining of High-Utility Patterns in Big IoT Databases
    Wu, Jimmy Ming-Tai
    Srivastava, Gautam
    Lin, Jerry Chun-Wei
    Djenouri, Youcef
    Wei, Min
    Polap, Dawid
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING (ICAISC 2021), PT II, 2021, 12855 : 205 - 216
  • [45] Mining High-Utility Sequential Patterns in Uncertain Databases
    Lin, Jerry Chun-Wei
    Srivastava, Gautam
    Li, Yuanfa
    Hong, Tzung-Pei
    Wang, Shyue-Liang
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5373 - 5380
  • [46] A residual utility-based concept for high-utility itemset mining
    Pushp Sra
    Satish Chand
    Knowledge and Information Systems, 2024, 66 (1) : 211 - 235
  • [47] Efficient algorithms for mining maximal high-utility itemsets
    Nguyen, Trinh D. D.
    Quoc-Bao Vu
    Nguyen, Loan T. T.
    PROCEEDINGS OF 2019 6TH NATIONAL FOUNDATION FOR SCIENCE AND TECHNOLOGY DEVELOPMENT (NAFOSTED) CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS), 2019, : 428 - 433
  • [48] Mining High-Utility Itemsets with Various Discount Strategies
    Lin, Jerry Chun-Wei
    Gan, Wensheng
    Fournier-Viger, Philippe
    Hong, Tzung-Pei
    Tseng, Vincent S.
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 742 - 751
  • [49] Efficient Mining of Short Periodic High-Utility Itemsets
    Lin, Jerry Chun-Wei
    Zhang, Jiexiong
    Fournier-Viger, Philippe
    Hong, Tzung-Pei
    Chen, Chien-Ming
    Su, Ja-Hwung
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 3083 - 3088
  • [50] High-Utility Itemset Mining with Effective Pruning Strategies
    Wu, Jimmy Ming-Tai
    Lin, Jerry Chun-Wei
    Tamrakar, Ashish
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2019, 13 (06)