HashEclat: an efficient frequent itemset algorithm

被引:27
|
作者
Zhang, Chunkai [1 ]
Tian, Panbo [1 ]
Zhang, Xudong [1 ]
Liao, Qing [1 ]
Jiang, Zoe L. [1 ]
Wang, Xuan [1 ]
机构
[1] Harbin Inst Technol Shenzhen, Dept Comp Sci & Technol, Shenzhen, Peoples R China
关键词
Frequent itemset; MinHash; Approximate algorithm; Eclat;
D O I
10.1007/s13042-018-00918-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset has a large number of transactions. In this work, for the purpose of efficiency improvement, we proposed an approximate Eclat algorithm named HashEclat based on MinHash, which could quickly estimate the size of the intersection set, and adjust the parameters k, E and minSup to consider the tradeoff between accuracy of the mining results and execution time. The parameter k is the top-k parameter of one-permutation MinHash algorithm; the parameter E is the estimate error of one intersection size; the parameter minSup is the minimum support threshold. In many real situations, an approximate result with faster speed maybe more useful than 'exact' result. The theoretical analysis and experiment results that we present demonstrate that the proposed algorithm can output almost all of the frequent itemset with faster speed and less memory space.
引用
收藏
页码:3003 / 3016
页数:14
相关论文
共 50 条
  • [41] Frequent itemset mining-based spatial subclustering algorithm
    Wang, Qian
    Gao, Zhi-Peng
    Qiu, Xue-Song
    Wang, Xing-Bin
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2015, 38 : 20 - 23
  • [42] AnyFI: An Anytime Frequent Itemset Mining Algorithm for Data Streams
    Goyal, Poonam
    Challa, Jagat Sesh
    Shrivastava, Shivin
    Goyal, Navneet
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 942 - 947
  • [43] A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
    Fumarola, Fabio
    Malerba, Donato
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2014, : 335 - 342
  • [44] A Spark-based Incremental Algorithm for Frequent Itemset Mining
    Wen, Haoxing
    Li, Xiaoguang
    Kou, Mingdong
    Tou, Huaixiao
    He, Hengyi
    Yang, Yulu
    BDIOT 2018: PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON BIG DATA AND INTERNET OF THINGS, 2018, : 53 - 58
  • [45] Improvement of Eclat Algorithm Based on Support in Frequent Itemset Mining
    Yu, Xiaomei
    Wang, Hong
    JOURNAL OF COMPUTERS, 2014, 9 (09) : 2116 - 2123
  • [46] A frequent itemset mining algorithm based on composite granular computing
    Wu, Hongjuan
    Liu, Yulu
    Yan, Pei
    Fang, Gang
    Zhong, Jing
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2018, 18 (01) : 247 - 257
  • [47] A Heuristic Rule based Approximate Frequent Itemset Mining Algorithm
    Li, Haifeng
    Zhang, Yuejin
    Zhang, Ning
    Jia, Hengyue
    PROMOTING BUSINESS ANALYTICS AND QUANTITATIVE MANAGEMENT OF TECHNOLOGY: 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2016), 2016, 91 : 324 - 333
  • [48] Implementation of an Improved Algorithm for Frequent Itemset Mining using Hadoop
    Agarwal, Ruchi
    Singh, Sunny
    Vats, Satvik
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 13 - 18
  • [49] Hybrid Approach for Improving Efficiency of Apriori Algorithm on Frequent Itemset
    Altameem, Arwa
    Ykhlef, Mourad
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2018, 18 (05): : 151 - 155
  • [50] A data mining proxy approach for efficient frequent itemset mining
    Jeffrey Xu Yu
    Zhiheng Li
    Guimei Liu
    The VLDB Journal, 2008, 17 : 947 - 970