Efficient k-Means on GPUs

被引:6
|
作者
Lutz, Clemens [1 ]
Bress, Sebastian [1 ]
Rabl, Tilmann [2 ]
Zeuch, Steffen [1 ]
Markl, Volker [2 ]
机构
[1] DFKI GmbH, Kaiserslautern, Germany
[2] TU Berlin, Berlin, Germany
关键词
D O I
10.1145/3211922.3211925
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
k-Means is a versatile clustering algorithm widely-used in practice. To cluster large data sets, state-of-the-art implementations use GPUs to shorten the data to knowledge time. These implementations commonly assign points on a GPU and update centroids on a CPU. We show that this approach has two main drawbacks. First, it separates the two algorithm phases over different processors, which requires an expensive data exchange between devices. Second, even when both phases are computed on the GPU, the same data are read twice per iteration, leading to inefficient use of memory bandwidth. In this paper, we describe a new approach that executes k-means in a single data pass per iteration. We propose a new algorithm to updates centroids that allows us to perform both phases efficiently on GPUs. Thereby, we remove data transfers within each iteration. We fuse both phases to eliminate artificial synchronization barriers, and thus compute k-means in a single data pass. Overall, we achieve up to 20x higher throughput compared to the state-of-the-art approach.
引用
收藏
页数:3
相关论文
共 50 条
  • [41] Anomaly Detection by Using Streaming K-Means and Batch K-Means
    Wang, Zhuo
    Zhou, Yanghui
    Li, Gangmin
    2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020), 2020, : 11 - 17
  • [42] Density K-means : A New Algorithm for Centers Initialization for K-means
    Lan, Xv
    Li, Qian
    Zheng, Yi
    PROCEEDINGS OF 2015 6TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE, 2015, : 958 - 961
  • [43] STiMR k-Means: An Efficient Clustering Method for Big Data
    Ben HajKacem, Mohamed Aymen
    Ben N'cir, Chiheb-Eddine
    Essoussi, Nadia
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (08)
  • [44] Energy Efficient Distance Computing: Application to K-Means Clustering
    Shim, Yong
    Choi, Seong-Wook
    Yang, Myeong-Gyu
    Chung, Keun-Yong
    Baek, Kwang-Hyun
    ELECTRONICS, 2022, 11 (03)
  • [45] A FSM Based Approach for Efficient Implementation of K-Means Algorithm
    Ratnakumar, Rahul
    Nanda, Satyasai Jagannath
    2016 20TH INTERNATIONAL SYMPOSIUM ON VLSI DESIGN AND TEST (VDAT), 2016,
  • [46] Efficient MapReduce Kernel k-Means for Big Data Clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
  • [47] An Optimized Algorithm For Efficient Problem Solving In K-MEANS Clustering
    Qureshi, Salim Raza
    Mehta, Sunali
    Gupta, Chaahat
    2017 INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING AND INFORMATION SYSTEMS (ICNGCIS), 2017, : 86 - 91
  • [48] An Efficient Character Recognition Scheme Based on K-Means Clustering
    Pourmohammad, Sajjad
    Soosahabi, Reza
    Maida, Anthony S.
    2013 5TH INTERNATIONAL CONFERENCE ON MODELING, SIMULATION AND APPLIED OPTIMIZATION (ICMSAO), 2013,
  • [49] An Efficient Hierarchy-Based of K-Means Clustering Algorithm
    Li Yong-peng
    Zhang Bo-tao
    Zhang Shuai-qin
    2008 INTERNATIONAL WORKSHOP ON INFORMATION TECHNOLOGY AND SECURITY, 2008, : 106 - 110
  • [50] An Effective and Efficient Algorithm for K-Means Clustering With New Formulation
    Nie, Feiping
    Li, Ziheng
    Wang, Rong
    Li, Xuelong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3433 - 3443