Faster Algorithms for the Constrained k-means Problem

被引:26
|
作者
Bhattacharya, Anup [1 ]
Jaiswal, Ragesh [1 ]
Kumar, Amit [1 ]
机构
[1] Indian Inst Technol Delhi, Dept Comp Sci & Engn, New Delhi, India
关键词
Constrained k-means clustering; D-2; sampling; CLUSTERING PROBLEMS; PTAS;
D O I
10.1007/s00224-017-9820-7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The classical center based clustering problems such as k-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. For instance, consider the r -gather clustering problem where there is an additional constraint that each of the clusters should have at least r points or the capacitated clustering problem where there is an upper bound on the cluster sizes. Consider a variant of the k-means problem that may be regarded as a general version of such problems. Here, the optimal clusters O-1, ... , O-k are an arbitrary partition of the dataset and the goal is to output k-centers c(1), ... , c (k) such that the objective function Sigma(k)(i=1) Sigma(x is an element of Oi) parallel to x - c(i)parallel to(2) is minimized. It is not difficult to argue that any algorithm (without knowing the optimal clusters) that outputs a single set of k centers, will not behave well as far as optimizing the above objective function is concerned. However, this does not rule out the existence of algorithms that output a list of such k centers such that at least one of these k centers behaves well. Given an error parameter epsilon > 0, let l denote the size of the smallest list of k-centers such that at least one of the k-centers gives a (1 + epsilon) approximation w.r.t. the objective function above. In this paper, we show an upper bound on l by giving a randomized algorithm that outputs a list of 2((O) over tilde (k/epsilon)) k-centers. We also give a closely matching lower bound of 2((Omega) over tilde (k/root epsilon)) . Moreover, our algorithm runs in time O(nd . 2((O) over tilde (k/epsilon))) . This is a significant improvement over the previous result of Ding and Xu (2015) who gave an algorithm with running time O(nd . (log n)(k) . 2(poly(k/epsilon))) and output a list of size O((log n)(k) . 2(poly(k/)epsilon)). Our techniques generalize for the k-median problem and for many other settings where non-Euclidean distance measures are involved.
引用
收藏
页码:93 / 115
页数:23
相关论文
共 50 条
  • [11] Faster K-Means Cluster Estimation
    Khandelwal, Siddhesh
    Awekar, Amit
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 520 - 526
  • [12] Local search approximation algorithms for the k-means problem with penalties
    Dongmei Zhang
    Chunlin Hao
    Chenchen Wu
    Dachuan Xu
    Zhenning Zhang
    Journal of Combinatorial Optimization, 2019, 37 : 439 - 453
  • [13] Local Search Approximation Algorithms for the Spherical k-Means Problem
    Zhang, Dongmei
    Cheng, Yukun
    Li, Min
    Wang, Yishui
    Xu, Dachuan
    ALGORITHMIC ASPECTS IN INFORMATION AND MANAGEMENT, AAIM 2019, 2019, 11640 : 341 - 351
  • [14] Local search approximation algorithms for the k-means problem with penalties
    Zhang, Dongmei
    Hao, Chunlin
    Wu, Chenchen
    Xu, Dachuan
    Zhang, Zhenning
    JOURNAL OF COMBINATORIAL OPTIMIZATION, 2019, 37 (02) : 439 - 453
  • [15] Constrained K-means with external information
    Chen Zhigang
    Li Xuan
    Yang Fan
    PROCEEDINGS OF THE 2013 8TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2013), 2013, : 490 - 493
  • [16] Even Faster Exact k-Means Clustering
    Borgelt, Christian
    ADVANCES IN INTELLIGENT DATA ANALYSIS XVIII, IDA 2020, 2020, 12080 : 93 - 105
  • [17] A Modified K-means Algorithms - Bi-Level K-Means Algorithm
    Yu, Shyr-Shen
    Chu, Shao-Wei
    Wang, Ching-Lin
    Chan, Yung-Kuan
    Chuang, Chia-Yi
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON SOFT COMPUTING IN INFORMATION COMMUNICATION TECHNOLOGY, 2014, : 10 - 13
  • [18] Empirical Evaluation of K-Means, Bisecting K-Means, Fuzzy C-Means and Genetic K-Means Clustering Algorithms
    Banerjee, Shreya
    Choudhary, Ankit
    Pal, Somnath
    2015 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE), 2015, : 172 - 176
  • [19] COMPARATIVE STUDY OF MUTATION OPERATORS IN THE GENETIC ALGORITHMS FOR THE K-MEANS PROBLEM
    Li, Riu
    Kazakovtsev, Lev A.
    FACTA UNIVERSITATIS-SERIES MATHEMATICS AND INFORMATICS, 2020, 35 (04): : 1091 - 1105
  • [20] The provably good parallel seeding algorithms for the k-means problem with penalties
    Li, Min
    Xu, Dachuan
    Zhang, Dongmei
    Zhou, Huiling
    INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, 2022, 29 (01) : 158 - 171