Data decomposition for parallel K-means clustering

被引:0
|
作者
Gursoy, A [1 ]
机构
[1] Koc Univ, Dept Comp Engn, TR-34450 Istanbul, Turkey
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Developing fast algorithms for clustering has been an important area of research in data mining and other fields. K-means is one of the widely used clustering algorithms. In this work, we have developed and evaluated parallelization of k-means method for low-dimensional data on message passing computers. Three different data decomposition schemes and their impact on the pruning of distance calculations in tree-based k-means algorithm have been studied. Random pattern decomposition has good load balancing but fails to prune distance calculations effectively. Compact spatial decomposition of patterns based on space filling curves outperforms random pattern decomposition even though it has load imbalance problem. In both cases, parallel tree-based k-means clustering runs significantly faster than the traditional parallel k-means.
引用
收藏
页码:241 / 248
页数:8
相关论文
共 50 条
  • [21] K-Means Extensions for Clustering Categorical Data
    Alwersh, Mohammed
    Kovacs, Laszlo
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 492 - 507
  • [22] New k-Means data clustering approach
    College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo 454000, China
    不详
    不详
    J. Comput. Inf. Syst., 2008, 2 (565-570):
  • [23] K-means*: Clustering by gradual data transformation
    Malinen, Mikko I.
    Mariescu-Istodor, Radu
    Franti, Pasi
    PATTERN RECOGNITION, 2014, 47 (10) : 3376 - 3386
  • [24] Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering
    Ansari Z.
    Afzal A.
    Sardar T.H.
    Journal of The Institution of Engineers (India): Series B, 2019, 100 (02) : 95 - 103
  • [25] Parallel k-Means Clustering for Quantitative Ecoregion Delineation Using Large Data Sets
    Kumar, Jitendra
    Mills, Richard T.
    Hoffman, Forrest M.
    Hargrove, William W.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 1602 - 1611
  • [26] Parallel k-means Clustering of Geospatial Data Sets Using Manycore CPU Architectures
    Mills, Richard Tran
    Sripathi, Vamsi
    Kumar, Jitendra
    Sreepathi, Sarat
    Hoffman, Forrest M.
    Hargrove, William W.
    2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 787 - 794
  • [27] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
    Li, Yongyi
    Yang, Zhongqiang
    Han, Kaixu
    Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
  • [28] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
  • [29] A NOTE ON WEIGHTED FUZZY K-MEANS CLUSTERING FOR CONCEPT DECOMPOSITION
    Kumar, Ch. Aswani
    Srinivas, S.
    CYBERNETICS AND SYSTEMS, 2010, 41 (06) : 455 - 467
  • [30] A parallel clustering algorithm for images using GA and k-means
    Wang, Ze
    Xiao, Shengzhong
    Cai, HuanFu
    Wang, ChunMei
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (06): : 2163 - 2170