An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets

被引:12
|
作者
Vanahalli, Manjunath K. [1 ]
Patil, Nagamma [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Informat Technol, Mangalore 575025, India
关键词
Bioinformatics; High dimensional datasets; Parallel preprocessing; Parallel algorithm; Colossal closed itemsets; Rowset cardinality table; PATTERNS;
D O I
10.1016/j.ins.2018.08.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Mining colossal itemsets from high dimensional datasets have gained focus in recent times. The conventional algorithms expend most of the time in mining small and mid-sized item sets, which do not enclose valuable and complete information for decision making. Mining Frequent Colossal Closed Itemsets (FCCI) from a high dimensional dataset play a highly significant role in decision making for many applications, especially in the field of bioinformatics. To mine FCCI from a high dimensional dataset, the existing preprocessing techniques fail to prune the complete set of irrelevant features and irrelevant rows. Besides, the state-of-the-art algorithms for the same are sequential and computationally expensive. The proposed work highlights an Effective Improved Parallel Preprocessing (EIPP) technique to prune the complete set of irrelevant features and irrelevant rows from high dimensional dataset and a novel efficient Parallel Frequent Colossal Closed Itemset Mining (PFCCIM) algorithm. Further, the PFCCIM algorithm is integrated with a novel Rowset Cardinality Table (RCT), an efficient method to check the closeness of a rowset and also an efficient pruning strategy to cut down the mining search space. The proposed PFCCIM algorithm is the first parallel algorithm to mine FCCI from a high dimensional dataset. The performance study shows the improved effectiveness of the proposed EIPP technique over the existing preprocessing techniques and the improved efficiency of the proposed PFCCIM algorithm over the existing algorithms. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:343 / 362
页数:20
相关论文
共 50 条
  • [11] An Algorithm for Mining Frequent Closed Itemsets
    Zhang Tiejun
    Yang Junrui
    Wang Xiuqin
    2008 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2008, : 240 - +
  • [12] An Efficient Algorithm for Mining Closed Frequent Itemsets in Data Streams
    Ao, Fujiang
    Du, Jing
    Yan, Yuejin
    Liu, Baohong
    Huang, Kedi
    8TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY WORKSHOPS: CIT WORKSHOPS 2008, PROCEEDINGS, 2008, : 37 - +
  • [13] IFCIA: An efficient algorithm for mining intertransaction frequent closed itemsets
    Dong, Jie
    Han, Min
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 678 - +
  • [14] NUCLEAR: An Efficient Methods for Mining Frequent Itemsets and Generators from Closed Frequent Itemsets
    Huy Quang Pham
    Duc Tran
    Ninh Bao Duong
    Fournier-Viger, Philippe
    Alioune Ngom
    INFORMATION TECHNOLOGY IN INDUSTRY, 2019, 7 (02): : 1 - 13
  • [15] Parallel algorithm for mining frequent itemsets
    Ruan, YL
    Liu, G
    Li, QH
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 2118 - 2121
  • [16] A novel parallel algorithm for frequent itemsets mining in massive small files datasets
    Zhang, Z. (zhangzl@swu.edu.cn), 1600, ICIC Express Letters Office, Tokai University, Kumamoto Campus, 9-1-1, Toroku, Kumamoto, 862-8652, Japan (05):
  • [17] Mining Frequent Closed Itemsets in Large High Dimensional Data
    余光柱
    曾宪辉
    邵世煌
    Journal of Donghua University(English Edition), 2008, 25 (04) : 416 - 424
  • [18] Efficient colossal pattern mining in high dimensional datasets
    Sohrabi, Mohammad Karim
    Barforoush, Ahmad Abdollahzadeh
    KNOWLEDGE-BASED SYSTEMS, 2012, 33 : 41 - 52
  • [19] An Efficient Algorithm for Mining Frequent Closed Itemsets over Data Stream
    Li Guodong
    Xia Kewen
    NEW TRENDS IN MECHATRONICS AND MATERIALS ENGINEERING, 2012, 151 : 570 - 575
  • [20] An Efficient Frequent Closed Itemsets Mining Algorithm Over Data Streams
    Tan, Jun
    Bu, Yingyong
    Yang, Bo
    2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING, VOL 3, PROCEEDINGS, 2009, : 65 - +