Efficient algorithms for deriving complete frequent itemsets from frequent closed itemsets

被引:4
|
作者
Wu, Cheng-Wei [1 ]
Huang, JianTao [1 ]
Lin, Yun-Wei [1 ]
Chuang, Chien-Yu [1 ]
Tseng, Yu-Chee [2 ]
机构
[1] Natl Ilan Univ, Yilan, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Yilan, Taiwan
关键词
Frequent itemset mining; Frequent closed itemset mining; Lossless and condensed representation; Deriving algorithms;
D O I
10.1007/s10489-020-02172-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When mining frequent itemsets (abbr. FIs) from dense datasets, it usually produces too many itemsets and results in the mining task to suffer from a very long execution time and high memory consumption. Frequent closed itemset (abbr. FCI) is a compact and lossless representation of FI. Mining FCIs can not only reduce the execution time and memory usage, but also reserve the complete information of FIs derived from FCIs. Although many studies have been proposed with various efficient methods for mining FCIs, few of them have developed algorithms for efficiently deriving FIs from FCIs. In this work, we propose two efficient algorithms named DFI-List and DFI-Growth for efficiently deriving FIs from FCIs. The both algorithms adopt depth-first search and divide-and-conquer methodology to derive all the FIs. DFI-List efficiently derives all the FIs with a vertical index structure called Cid List. DFI-Growth compresses the information of FCIs into tree structures and applies pattern-growth strategy to derive FIs from the trees. Empirical experiments show that DFI-List is the most efficient and scalable algorithm on the dense datasets. For example, when the minimum support threshold is set to 50% on the Chess dataset, DFI-List runs faster than LevelWise (Pasquier et al. Inf Syst 24(1): 25-46, 1999b) over 100 times. As for DFI-Growth, it is the most stable and memory efficient algorithm on the sparse datasets. Both DFI-Growth and DFI-List are superior to the state-of-the-art algorithm (Pasquier et al. Inf Syst 24(1): 25-46, 199b) in terms of execution time.
引用
收藏
页码:7002 / 7023
页数:22
相关论文
共 50 条
  • [21] An Algorithm of Mining Closed Frequent Itemsets
    Li, Haifeng
    PROCEEDINGS OF THE 2015 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND AUTOMATION ENGINEERING, 2016, 42 : 95 - 98
  • [22] CloseMiner: Discovering frequent closed itemsets using frequent closed tidsets
    Singh, NG
    Singh, SR
    Mahanta, AK
    Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 633 - 636
  • [23] An Efficient Algorithm for Mining Closed Frequent Itemsets in Data Streams
    Ao, Fujiang
    Du, Jing
    Yan, Yuejin
    Liu, Baohong
    Huang, Kedi
    8TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY WORKSHOPS: CIT WORKSHOPS 2008, PROCEEDINGS, 2008, : 37 - +
  • [24] PGLCM: efficient parallel mining of closed frequent gradual itemsets
    Trong Dinh Thac Do
    Alexandre Termier
    Anne Laurent
    Benjamin Negrevergne
    Behrooz Omidvar-Tehrani
    Sihem Amer-Yahia
    Knowledge and Information Systems, 2015, 43 : 497 - 527
  • [25] PGLCM: efficient parallel mining of closed frequent gradual itemsets
    Trong Dinh Thac Do
    Termier, Alexandre
    Laurent, Anne
    Negrevergne, Benjamin
    Omidvar-Tehrani, Behrooz
    Amer-Yahia, Sihem
    KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 43 (03) : 497 - 527
  • [26] IFCIA: An efficient algorithm for mining intertransaction frequent closed itemsets
    Dong, Jie
    Han, Min
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 678 - +
  • [27] Mining frequent closed itemsets from distributed repositories
    Lucchese, Claudio
    Orlando, Salvatore
    Perego, Raffaele
    Silvestri, Claudio
    KNOWLEDGE AND DATA MANAGEMENT IN GRIDS, 2007, : 221 - +
  • [28] Mining Frequent Closed Itemsets from Distributed Dataset
    Ju, Chunhua
    Ni, Dongjun
    PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN, VOL 1, 2008, : 37 - 41
  • [29] Efficient frequent itemsets mining by sampling
    Zhao, Yanchang
    Zhang, Chengqi
    Zhang, Shichao
    ADVANCES IN INTELLIGENT IT: ACTIVE MEDIA TECHNOLOGY 2006, 2006, 138 : 112 - +
  • [30] Fast Approximation of Probabilistic Frequent Closed Itemsets
    Peterson, Erich A.
    Tang, Peiyi
    PROCEEDINGS OF THE 50TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE, 2012,