HashEclat: an efficient frequent itemset algorithm

被引:27
|
作者
Zhang, Chunkai [1 ]
Tian, Panbo [1 ]
Zhang, Xudong [1 ]
Liao, Qing [1 ]
Jiang, Zoe L. [1 ]
Wang, Xuan [1 ]
机构
[1] Harbin Inst Technol Shenzhen, Dept Comp Sci & Technol, Shenzhen, Peoples R China
关键词
Frequent itemset; MinHash; Approximate algorithm; Eclat;
D O I
10.1007/s13042-018-00918-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset has a large number of transactions. In this work, for the purpose of efficiency improvement, we proposed an approximate Eclat algorithm named HashEclat based on MinHash, which could quickly estimate the size of the intersection set, and adjust the parameters k, E and minSup to consider the tradeoff between accuracy of the mining results and execution time. The parameter k is the top-k parameter of one-permutation MinHash algorithm; the parameter E is the estimate error of one intersection size; the parameter minSup is the minimum support threshold. In many real situations, an approximate result with faster speed maybe more useful than 'exact' result. The theoretical analysis and experiment results that we present demonstrate that the proposed algorithm can output almost all of the frequent itemset with faster speed and less memory space.
引用
收藏
页码:3003 / 3016
页数:14
相关论文
共 50 条
  • [21] BISC: A Bitmap Itemset Support Counting Approach for Efficient Frequent Itemset Mining
    Chen, Jinlin
    Xiao, Keli
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2010, 4 (03)
  • [22] Efficient Incremental Itemset Tree for Approximate Frequent Itemset Mining On Data Stream
    Bai, Pavitra S.
    Kumar, Ravi G. K.
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2016, : 239 - 242
  • [23] Fast Mining Algorithm of Frequent Itemset Based on Spark
    Ding J.-M.
    Li H.-B.
    Deng B.
    Jia L.-Y.
    You J.-G.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2446 - 2464
  • [24] Novel algorithm for frequent itemset mining in data warehouses
    Xu L.-J.
    Xie K.-L.
    Journal of Zhejiang University-SCIENCE A, 2006, 7 (2): : 216 - 224
  • [25] YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark
    Qiu, Hongjian
    Gu, Rong
    Yuan, Chunfeng
    Huang, Yihua
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 1664 - 1671
  • [26] Frequent Itemset Mining Algorithm based on Sampling Method
    Li, Haifeng
    Zhang, Ning
    Zhang, Yuejin
    PROCEEDINGS OF THE 2015 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND AUTOMATION ENGINEERING, 2016, 42 : 852 - 855
  • [27] A New Parallel Algorithm for the Frequent Itemset Mining Problem
    Craus, Mitica
    PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING, 2008, : 165 - 170
  • [28] An Efficient Sliding Window Based Algorithm for Adaptive Frequent Itemset Mining over Data Streams
    Deypir, Mhmood
    Sadreddini, Mohammad Hadi
    Taahomi, Mehran
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2013, 29 (05) : 1001 - 1020
  • [29] A time- and memory-efficient frequent itemset discovering algorithm for association rule mining
    Ivancsy, Renata
    Vajk, Istvan
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2006, 27 (04) : 270 - 280
  • [30] MAFIA: A maximal frequent itemset algorithm for transactional databases
    Burdick, D
    Calimlim, M
    Gehrke, J
    17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 443 - 452