Reuse-centric k-means configuration

被引:0
|
作者
Zhang, Lijun [1 ]
Guan, Hui [1 ]
Ding, Yufei [2 ]
Shen, Xipeng [3 ]
Krim, Hamid [3 ]
机构
[1] Univ Massachusetts, Amherst, MA 01002 USA
[2] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[3] North Carolina State Univ, Raleigh, NC 27606 USA
基金
美国国家科学基金会;
关键词
K-means; Algorithm configuration; Computation reuse; TOP;
D O I
10.1016/j.is.2021.101787
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
K-means configuration is to find a configuration of k-means (e.g., the number of clusters, feature sets) that maximize some objectives. It is a time-consuming process due to the iterative nature of k-means. This paper proposes reuse-centric k-means configuration to accelerate k-means configuration. It is based on the observation that the explorations of different configurations share lots of common or similar computations. Effectively reusing the computations from prior trials of different configurations could largely shorten the configuration time. To materialize the idea, the paper presents a set of novel techniques, including reuse-based filtering, center reuse, and a two-phase design to capitalize on the reuse opportunities on three levels: validation, number of clusters, and feature sets. Experiments on k-means-based data classification tasks show that reuse-centric k-means configuration can speed up a heuristic search-based configuration process by a factor of 5.8, and a uniform search-based attainment of classification error surfaces by a factor of 9.1. The paper meanwhile provides some important insights on how to effectively apply the acceleration techniques to tap into a full potential. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup
    Ding, Yufei
    Zhao, Yue
    Shen, Xipeng
    Musuvathi, Madanlal
    Mytkowicz, Todd
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 579 - 587
  • [32] A GENERALIZED k-MEANS PROBLEM FOR CLUSTERING AND AN ADMM-BASED k-MEANS ALGORITHM
    Ling, Liyun
    Gu, Yan
    Zhang, Su
    Wen, Jie
    JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2024, 20 (06) : 2089 - 2115
  • [33] Comparison of K-means and K-means plus plus for image compression with thermographic images
    Biswas, Hridoy
    Umbaugh, Scott E.
    Marino, Dominic
    Sackman, Joseph
    THERMOSENSE: THERMAL INFRARED APPLICATIONS XLIII, 2021, 11743
  • [34] Improved Guarantees for k-means plus plus and k-means plus plus Parallel
    Makarychev, Konstantin
    Reddy, Aravind
    Shan, Liren
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [35] Large-scale k-means clustering with user-centric privacy-preservation
    Jun Sakuma
    Shigenobu Kobayashi
    Knowledge and Information Systems, 2010, 25 : 253 - 279
  • [36] Kernel Penalized K-means: A feature selection method based on Kernel K-means
    Maldonado, Sebastian
    Carrizosa, Emilio
    Weber, Richard
    INFORMATION SCIENCES, 2015, 322 : 150 - 160
  • [37] Large-scale k-means clustering with user-centric privacy-preservation
    Sakuma, Jun
    Kobayashi, Shigenobu
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (02) : 253 - 279
  • [38] A robust color image quantization algorithm based on knowledge reuse of K-means clustering ensemble
    Department of Electrical and Computer Engineering, Brigham Young University, Provo, UT 84602, United States
    不详
    不详
    J. Multimedia, 2008, 2 (20-27):
  • [39] Dynamic K-Means Clustering of Workload and Cloud Resource Configuration for Cloud Elastic Model
    Daradkeh, Tariq
    Agarwal, Anjali
    Zaman, Marzia
    Goel, Nishith
    IEEE ACCESS, 2020, 8 : 219430 - 219446
  • [40] k*-means:: A new generalized k-means clustering algorithm
    Cheung, YM
    PATTERN RECOGNITION LETTERS, 2003, 24 (15) : 2883 - 2893