Faster Algorithms for the Constrained k-means Problem

被引：26

作者：

Bhattacharya, Anup ^{[1
]}

Jaiswal, Ragesh ^{[1
]}

Kumar, Amit ^{[1
]}

机构：

[1] Indian Inst Technol Delhi, Dept Comp Sci & Engn, New Delhi, India

来源：

THEORY OF COMPUTING SYSTEMS | 2018年 / 62卷 / 01期

关键词：

Constrained k-means clustering; D-2; sampling; CLUSTERING PROBLEMS; PTAS;

D O I：

10.1007/s00224-017-9820-7

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The classical center based clustering problems such as k-means/median/center assume that the optimal clusters satisfy the locality property that the points in the same cluster are close to each other. A number of clustering problems arise in machine learning where the optimal clusters do not follow such a locality property. For instance, consider the r -gather clustering problem where there is an additional constraint that each of the clusters should have at least r points or the capacitated clustering problem where there is an upper bound on the cluster sizes. Consider a variant of the k-means problem that may be regarded as a general version of such problems. Here, the optimal clusters O-1, ... , O-k are an arbitrary partition of the dataset and the goal is to output k-centers c(1), ... , c (k) such that the objective function Sigma(k)(i=1) Sigma(x is an element of Oi) parallel to x - c(i)parallel to(2) is minimized. It is not difficult to argue that any algorithm (without knowing the optimal clusters) that outputs a single set of k centers, will not behave well as far as optimizing the above objective function is concerned. However, this does not rule out the existence of algorithms that output a list of such k centers such that at least one of these k centers behaves well. Given an error parameter epsilon > 0, let l denote the size of the smallest list of k-centers such that at least one of the k-centers gives a (1 + epsilon) approximation w.r.t. the objective function above. In this paper, we show an upper bound on l by giving a randomized algorithm that outputs a list of 2((O) over tilde (k/epsilon)) k-centers. We also give a closely matching lower bound of 2((Omega) over tilde (k/root epsilon)) . Moreover, our algorithm runs in time O(nd . 2((O) over tilde (k/epsilon))) . This is a significant improvement over the previous result of Ding and Xu (2015) who gave an algorithm with running time O(nd . (log n)(k) . 2(poly(k/epsilon))) and output a list of size O((log n)(k) . 2(poly(k/)epsilon)). Our techniques generalize for the k-median problem and for many other settings where non-Euclidean distance measures are involved.

引用

页码：93 / 115

页数：23

共 50 条

[11] Faster K-Means Cluster Estimation
Khandelwal, Siddhesh
Awekar, Amit
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 520 - 526
[12] Local search approximation algorithms for the k-means problem with penalties
Dongmei Zhang
Chunlin Hao
Chenchen Wu
Dachuan Xu
Zhenning Zhang
Journal of Combinatorial Optimization, 2019, 37 : 439 - 453
[13] Local Search Approximation Algorithms for the Spherical k-Means Problem
Zhang, Dongmei
Cheng, Yukun
Li, Min
Wang, Yishui
Xu, Dachuan
ALGORITHMIC ASPECTS IN INFORMATION AND MANAGEMENT, AAIM 2019, 2019, 11640 : 341 - 351
[14] Local search approximation algorithms for the k-means problem with penalties
Zhang, Dongmei
Hao, Chunlin
Wu, Chenchen
Xu, Dachuan
Zhang, Zhenning
JOURNAL OF COMBINATORIAL OPTIMIZATION, 2019, 37 (02) : 439 - 453
[15] Constrained K-means with external information
Chen Zhigang
Li Xuan
Yang Fan
PROCEEDINGS OF THE 2013 8TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2013), 2013, : 490 - 493
[16] Even Faster Exact k-Means Clustering
Borgelt, Christian
ADVANCES IN INTELLIGENT DATA ANALYSIS XVIII, IDA 2020, 2020, 12080 : 93 - 105
[17] A Modified K-means Algorithms - Bi-Level K-Means Algorithm
Yu, Shyr-Shen
Chu, Shao-Wei
Wang, Ching-Lin
Chan, Yung-Kuan
Chuang, Chia-Yi
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON SOFT COMPUTING IN INFORMATION COMMUNICATION TECHNOLOGY, 2014, : 10 - 13
[18] Empirical Evaluation of K-Means, Bisecting K-Means, Fuzzy C-Means and Genetic K-Means Clustering Algorithms
Banerjee, Shreya
Choudhary, Ankit
Pal, Somnath
2015 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE), 2015, : 172 - 176
[19] COMPARATIVE STUDY OF MUTATION OPERATORS IN THE GENETIC ALGORITHMS FOR THE K-MEANS PROBLEM
Li, Riu
Kazakovtsev, Lev A.
FACTA UNIVERSITATIS-SERIES MATHEMATICS AND INFORMATICS, 2020, 35 (04): : 1091 - 1105
[20] The provably good parallel seeding algorithms for the k-means problem with penalties
Li, Min
Xu, Dachuan
Zhang, Dongmei
Zhou, Huiling
INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, 2022, 29 (01) : 158 - 171

← 1 2 3 4 5 →