HashEclat: an efficient frequent itemset algorithm

被引:27
|
作者
Zhang, Chunkai [1 ]
Tian, Panbo [1 ]
Zhang, Xudong [1 ]
Liao, Qing [1 ]
Jiang, Zoe L. [1 ]
Wang, Xuan [1 ]
机构
[1] Harbin Inst Technol Shenzhen, Dept Comp Sci & Technol, Shenzhen, Peoples R China
关键词
Frequent itemset; MinHash; Approximate algorithm; Eclat;
D O I
10.1007/s13042-018-00918-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset has a large number of transactions. In this work, for the purpose of efficiency improvement, we proposed an approximate Eclat algorithm named HashEclat based on MinHash, which could quickly estimate the size of the intersection set, and adjust the parameters k, E and minSup to consider the tradeoff between accuracy of the mining results and execution time. The parameter k is the top-k parameter of one-permutation MinHash algorithm; the parameter E is the estimate error of one intersection size; the parameter minSup is the minimum support threshold. In many real situations, an approximate result with faster speed maybe more useful than 'exact' result. The theoretical analysis and experiment results that we present demonstrate that the proposed algorithm can output almost all of the frequent itemset with faster speed and less memory space.
引用
收藏
页码:3003 / 3016
页数:14
相关论文
共 50 条
  • [1] HashEclat: an efficient frequent itemset algorithm
    Chunkai Zhang
    Panbo Tian
    Xudong Zhang
    Qing Liao
    Zoe L. Jiang
    Xuan Wang
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 3003 - 3016
  • [2] An efficient frequent itemset mining algorithm
    Luo, Ke
    Zhang, Xue-Mao
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 756 - 761
  • [3] An efficient algorithm for fuzzy frequent itemset mining
    Wu, Tsu-Yang
    Lin, Jerry Chun-Wei
    Yun, Unil
    Chen, Chun-Hao
    Srivastava, Gautam
    Lv, Xianbiao
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (05) : 5787 - 5797
  • [4] An efficient algorithm for frequent itemset mining on data streams
    Xie Zhi-Jun
    Chen Hong
    Li, Cuiping
    ADVANCES IN DATA MINING: APPLICATIONS IN MEDICINE, WEB MINING, MARKETING, IMAGE AND SIGNAL MINING, 2006, 4065 : 474 - 491
  • [5] An efficient polynomial delay algorithm for pseudo frequent itemset mining
    Uno, Takeaki
    Arimura, Hiroki
    DISCOVERY SCIENCE, PROCEEDINGS, 2007, 4755 : 219 - +
  • [6] A maximal frequent itemset algorithm
    Wang, H
    Li, QH
    Ma, CX
    Li, KL
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, 2003, 2639 : 484 - 490
  • [7] AT-Mine: An Efficient Algorithm of Frequent Itemset Mining on Uncertain Dataset
    Wang, Le
    Feng, Lin
    Wu, Mingfei
    JOURNAL OF COMPUTERS, 2013, 8 (06) : 1417 - 1426
  • [8] MAFIA: A maximal frequent itemset algorithm
    Burdick, D
    Calimlim, M
    Flannick, J
    Gehrke, J
    Yiu, TM
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (11) : 1490 - 1504
  • [9] A parallel algorithm for frequent itemset mining
    Li, L
    Zhai, DH
    Fan, J
    PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT'2003, PROCEEDINGS, 2003, : 868 - 871
  • [10] Efficient Skyline Frequent-Utility Itemset Mining Algorithm on Massive Data
    He, Jingxuan
    Han, Xixian
    Wan, Xiaolong
    Wang, Jinbao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (07) : 3009 - 3023