Effect of data skewness in parallel mining of association rules

被引:0
|
作者
Cheung, DW [1 ]
Xiao, YQ [1 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong
关键词
association rules; data mining; data skewness; parallel computing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An efficient parallel algorithm FPM(Fast Parallel Mining) for mining association rules on a shared-nothing parallel system has been proposed. It adopts the count distribution approach and has incorporated two powerful candidate pruning techniques, i.e., distributed pruning and global pruning. It has a simple communication scheme which performs only one round of message exchange in each iteration. We found that the two pruning techniques are very sensitive to data skewness, which describes the degree of non-uniformity of the itemset distribution among the database partitions. Distributed pruning is very effective when data skewness is high. Global pruning is more effective than distributed pruning even for the mild data skewness case. We have implemented the algorithm on an IBM SP2 parallel machine. The performance studies confirm our observation on the relationship between the effectiveness of the two pruning techniques and data skewness. It has also shown that FPM outperforms CD (Count Distribution) consistently, which is a parallel version of the popular Apriori algorithm [2, 3]. Furthermore, FPM has nice parallelism of speedup, scaleup and sizeup.
引用
收藏
页码:48 / 60
页数:13
相关论文
共 50 条
  • [1] Effect of data skewness and workload balance in parallel data mining
    Cheung, DW
    Lee, SD
    Xiao, YQ
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (03) : 498 - 514
  • [2] Scalable parallel data mining for association rules
    Han, EH
    Karypis, G
    Kumar, V
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2000, 12 (03) : 337 - 352
  • [3] Parallel mining of association rules
    IBM Almaden Research Cent, San Jose, United States
    IEEE Trans Knowl Data Eng, 6 (962-969):
  • [4] Parallel mining of association rules
    Agrawal, R
    Shafer, JC
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (06) : 962 - 969
  • [5] Parallel Mining of Fuzzy Association Rules on Dense Data Sets
    Burda, Michal
    Pavliska, Viktor
    Valasek, Radek
    2014 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2014, : 2156 - 2162
  • [6] Parallel algorithms for mining association rules in time series data
    Sarker, BK
    Mori, T
    Hirata, T
    Uehara, K
    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2003, 2745 : 273 - 284
  • [7] Effect of adaptive interval configuration on parallel mining association rules
    Hu, Kan
    Cheung, D.W.
    Xia, Shaowei
    Ruan Jian Xue Bao/Journal of Software, 2000, 11 (02): : 159 - 172
  • [8] Parallel Data Mining for Association Rules on Shared-Memory Systems
    S. Parthasarathy
    M. J. Zaki
    M. Ogihara
    W. Li
    Knowledge and Information Systems, 2001, 3 (1) : 1 - 29
  • [9] New parallel algorithm for mining association rules
    School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China
    J. Donghua Univ., 2006, 6 (76-79):
  • [10] Parallel Association Rules Mining on GPU: CUDA
    Bai, H. T.
    Sun, J. G.
    He, L. L.
    ITESS: 2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES, PT 1, 2008, : 142 - 148