Efficient k-Means on GPUs

被引：6

作者：

Lutz, Clemens ^{[1
]}

Bress, Sebastian ^{[1
]}

Rabl, Tilmann ^{[2
]}

Zeuch, Steffen ^{[1
]}

Markl, Volker ^{[2
]}

机构：

[1] DFKI GmbH, Kaiserslautern, Germany

[2] TU Berlin, Berlin, Germany

来源：

14TH INTERNATIONAL WORKSHOP ON DATA MANAGEMENT ON NEW HARDWARE (DAMON 2018) | 2018年

关键词：

D O I：

10.1145/3211922.3211925

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

k-Means is a versatile clustering algorithm widely-used in practice. To cluster large data sets, state-of-the-art implementations use GPUs to shorten the data to knowledge time. These implementations commonly assign points on a GPU and update centroids on a CPU. We show that this approach has two main drawbacks. First, it separates the two algorithm phases over different processors, which requires an expensive data exchange between devices. Second, even when both phases are computed on the GPU, the same data are read twice per iteration, leading to inefficient use of memory bandwidth. In this paper, we describe a new approach that executes k-means in a single data pass per iteration. We propose a new algorithm to updates centroids that allows us to perform both phases efficiently on GPUs. Thereby, we remove data transfers within each iteration. We fuse both phases to eliminate artificial synchronization barriers, and thus compute k-means in a single data pass. Overall, we achieve up to 20x higher throughput compared to the state-of-the-art approach.

引用

页数：3

共 50 条

[41] Anomaly Detection by Using Streaming K-Means and Batch K-Means
Wang, Zhuo
Zhou, Yanghui
Li, Gangmin
2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020), 2020, : 11 - 17
[42] Density K-means : A New Algorithm for Centers Initialization for K-means
Lan, Xv
Li, Qian
Zheng, Yi
PROCEEDINGS OF 2015 6TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE, 2015, : 958 - 961
[43] STiMR k-Means: An Efficient Clustering Method for Big Data
Ben HajKacem, Mohamed Aymen
Ben N'cir, Chiheb-Eddine
Essoussi, Nadia
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (08)
[44] Energy Efficient Distance Computing: Application to K-Means Clustering
Shim, Yong
Choi, Seong-Wook
Yang, Myeong-Gyu
Chung, Keun-Yong
Baek, Kwang-Hyun
ELECTRONICS, 2022, 11 (03)
[45] A FSM Based Approach for Efficient Implementation of K-Means Algorithm
Ratnakumar, Rahul
Nanda, Satyasai Jagannath
2016 20TH INTERNATIONAL SYMPOSIUM ON VLSI DESIGN AND TEST (VDAT), 2016,
[46] Efficient MapReduce Kernel k-Means for Big Data Clustering
Tsapanos, Nikolaos
Tefas, Anastasios
Nikolaidis, Nikolaos
Pitas, Ioannis
9TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE (SETN 2016), 2016,
[47] An Optimized Algorithm For Efficient Problem Solving In K-MEANS Clustering
Qureshi, Salim Raza
Mehta, Sunali
Gupta, Chaahat
2017 INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING AND INFORMATION SYSTEMS (ICNGCIS), 2017, : 86 - 91
[48] An Efficient Character Recognition Scheme Based on K-Means Clustering
Pourmohammad, Sajjad
Soosahabi, Reza
Maida, Anthony S.
2013 5TH INTERNATIONAL CONFERENCE ON MODELING, SIMULATION AND APPLIED OPTIMIZATION (ICMSAO), 2013,
[49] An Efficient Hierarchy-Based of K-Means Clustering Algorithm
Li Yong-peng
Zhang Bo-tao
Zhang Shuai-qin
2008 INTERNATIONAL WORKSHOP ON INFORMATION TECHNOLOGY AND SECURITY, 2008, : 106 - 110
[50] An Effective and Efficient Algorithm for K-Means Clustering With New Formulation
Nie, Feiping
Li, Ziheng
Wang, Rong
Li, Xuelong
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (04) : 3433 - 3443

← 1 2 3 4 5 →