Reuse-centric k-means configuration

被引：0

作者：

Zhang, Lijun ^{[1
]}

Guan, Hui ^{[1
]}

Ding, Yufei ^{[2
]}

Shen, Xipeng ^{[3
]}

Krim, Hamid ^{[3
]}

机构：

[1] Univ Massachusetts, Amherst, MA 01002 USA

[2] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA

[3] North Carolina State Univ, Raleigh, NC 27606 USA

来源：

INFORMATION SYSTEMS | 2021年 / 100卷

基金：

美国国家科学基金会;

关键词：

K-means; Algorithm configuration; Computation reuse; TOP;

D O I：

10.1016/j.is.2021.101787

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

K-means configuration is to find a configuration of k-means (e.g., the number of clusters, feature sets) that maximize some objectives. It is a time-consuming process due to the iterative nature of k-means. This paper proposes reuse-centric k-means configuration to accelerate k-means configuration. It is based on the observation that the explorations of different configurations share lots of common or similar computations. Effectively reusing the computations from prior trials of different configurations could largely shorten the configuration time. To materialize the idea, the paper presents a set of novel techniques, including reuse-based filtering, center reuse, and a two-phase design to capitalize on the reuse opportunities on three levels: validation, number of clusters, and feature sets. Experiments on k-means-based data classification tasks show that reuse-centric k-means configuration can speed up a heuristic search-based configuration process by a factor of 5.8, and a uniform search-based attainment of classification error surfaces by a factor of 9.1. The paper meanwhile provides some important insights on how to effectively apply the acceleration techniques to tap into a full potential. (C) 2021 Elsevier Ltd. All rights reserved.

引用

页数：14

共 50 条

[31] Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup
Ding, Yufei
Zhao, Yue
Shen, Xipeng
Musuvathi, Madanlal
Mytkowicz, Todd
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 579 - 587
[32] A GENERALIZED k-MEANS PROBLEM FOR CLUSTERING AND AN ADMM-BASED k-MEANS ALGORITHM
Ling, Liyun
Gu, Yan
Zhang, Su
Wen, Jie
JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2024, 20 (06) : 2089 - 2115
[33] Comparison of K-means and K-means plus plus for image compression with thermographic images
Biswas, Hridoy
Umbaugh, Scott E.
Marino, Dominic
Sackman, Joseph
THERMOSENSE: THERMAL INFRARED APPLICATIONS XLIII, 2021, 11743
[34] Improved Guarantees for k-means plus plus and k-means plus plus Parallel
Makarychev, Konstantin
Reddy, Aravind
Shan, Liren
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[35] Large-scale k-means clustering with user-centric privacy-preservation
Jun Sakuma
Shigenobu Kobayashi
Knowledge and Information Systems, 2010, 25 : 253 - 279
[36] Kernel Penalized K-means: A feature selection method based on Kernel K-means
Maldonado, Sebastian
Carrizosa, Emilio
Weber, Richard
INFORMATION SCIENCES, 2015, 322 : 150 - 160
[37] Large-scale k-means clustering with user-centric privacy-preservation
Sakuma, Jun
Kobayashi, Shigenobu
KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (02) : 253 - 279
[38] A robust color image quantization algorithm based on knowledge reuse of K-means clustering ensemble
Department of Electrical and Computer Engineering, Brigham Young University, Provo, UT 84602, United States
不详
不详
J. Multimedia, 2008, 2 (20-27):
[39] Dynamic K-Means Clustering of Workload and Cloud Resource Configuration for Cloud Elastic Model
Daradkeh, Tariq
Agarwal, Anjali
Zaman, Marzia
Goel, Nishith
IEEE ACCESS, 2020, 8 : 219430 - 219446
[40] k*-means:: A new generalized k-means clustering algorithm
Cheung, YM
PATTERN RECOGNITION LETTERS, 2003, 24 (15) : 2883 - 2893

← 1 2 3 4 5 →