Data decomposition for parallel K-means clustering

被引:0
|
作者
Gursoy, A [1 ]
机构
[1] Koc Univ, Dept Comp Engn, TR-34450 Istanbul, Turkey
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Developing fast algorithms for clustering has been an important area of research in data mining and other fields. K-means is one of the widely used clustering algorithms. In this work, we have developed and evaluated parallelization of k-means method for low-dimensional data on message passing computers. Three different data decomposition schemes and their impact on the pruning of distance calculations in tree-based k-means algorithm have been studied. Random pattern decomposition has good load balancing but fails to prune distance calculations effectively. Compact spatial decomposition of patterns based on space filling curves outperforms random pattern decomposition even though it has load imbalance problem. In both cases, parallel tree-based k-means clustering runs significantly faster than the traditional parallel k-means.
引用
收藏
页码:241 / 248
页数:8
相关论文
共 50 条
  • [1] Parallel batch k-means for Big data clustering
    Alguliyev, Rasim M.
    Aliguliyev, Ramiz M.
    Sukhostat, Lyudmila, V
    COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 152
  • [2] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [3] Parallel K-Means Clustering Based on MapReduce
    Zhao, Weizhong
    Ma, Huifang
    He, Qing
    CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 674 - 679
  • [4] K-Means Clustering With Incomplete Data
    Wang, Siwei
    Li, Miaomiao
    Hu, Ning
    Zhu, En
    Hu, Jingtao
    Liu, Xinwang
    Yin, Jianping
    IEEE ACCESS, 2019, 7 : 69162 - 69171
  • [5] HdK-Means: Hadoop Based Parallel K-Means Clustering for Big Data
    Bandyopadhyay, Soumyendu Sekhar
    Halder, Anup Kumar
    Chatterjee, Piyali
    Nasipuri, Mita
    Basu, Subhadip
    2017 IEEE CALCUTTA CONFERENCE (CALCON), 2017, : 452 - 456
  • [6] k-Means Clustering of Asymmetric Data
    Olszewski, Dominik
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT I, 2012, 7208 : 243 - 254
  • [7] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [8] IMPROVEMENT IN K-MEANS CLUSTERING ALGORITHM FOR DATA CLUSTERING
    Rajeswari, K.
    Acharya, Omkar
    Sharma, Mayur
    Kopnar, Mahesh
    Karandikar, Kiran
    1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 367 - 369
  • [9] Parallel Set Determination and K-means Clustering for Data Mining on Telecommunication Networks
    Ren, Da-Qi
    Zheng, Da
    Huang, Guowei
    Zhang, Shujie
    Wei, Zane
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1553 - 1557
  • [10] Parallel K-means clustering algorithm on DNA dataset
    Othman, F
    Abdullah, R
    Rashid, NA
    Salam, RA
    PARALLEL AND DISTRIBUTED COMPUTING: APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2004, 3320 : 248 - 251