Near-Optimal Explainable k-Means for All Dimensions

被引：0

作者：

Charikar, Moses ^{[1
]}

Hu, Lunjia ^{[1
]}

机构：

[1] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA

来源：

PROCEEDINGS OF THE 2022 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA | 2022年

关键词：

LOCAL SEARCH YIELDS; BLACK-BOX;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Many clustering algorithms are guided by certain cost functions such as the widely-used k-means cost. These algorithms divide data points into clusters with often complicated boundaries, creating difficulties in explaining the clustering decision. In a recent work, Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020) introduced explainable clustering, where the cluster boundaries are axis-parallel hyperplanes and the clustering is obtained by applying a decision tree to the data. The central question here is: how much does the explainability constraint increase the value of the cost function? Given d-dimensional data points, we show an efficient algorithm that finds an explainable clustering whose k-means cost is at most k(1-2/d) poly(d log k) times the minimum cost achievable by a clustering without the explainability constraint, assuming k; d >= 2. Taking the minimum of this bound and the k polylog(k) bound in independent work by Makarychev-Shan (ICML 2021), Gamlath-Jia-Polak-Svensson (2021), or Esfandiari-Mirrokni-Narayanan (2021), we get an improved bound of k 1 polylog(k), which we show is optimal for every choice of k; d >= 2 up to a poly-logarithmic factor in k. For d = 2 in particular, we show an O (log k log log k) bound, improving near-exponentially over the previous best bound of O(k log k) by Laber and Murtinho (ICML 2021).

引用

页码：2580 / 2606

页数：27

共 50 条

[21] Near-optimal clustering in the k-machine model
Bandyapadhyay, Sayan
Inamdar, Tanmay
Pai, Shreyas
Pemmaraju, Sriram, V
THEORETICAL COMPUTER SCIENCE, 2022, 899 : 80 - 97
[22] Near-Optimal Approximate Decremental All Pairs Shortest Paths
Chechik, Shiri
2018 IEEE 59TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2018, : 170 - 181
[23] K-means - a fast and efficient K-means algorithms
Nguyen C.D.
Duong T.H.
Nguyen, Cuong Duc (nguyenduccuong@tdt.edu.vn), 2018, Inderscience Publishers, 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (11) : 27 - 45
[24] Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Andoni, Alexandr
Indyk, Piotr
47TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2006, : 459 - +
[25] Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Andoni, Alexandr
Indyk, Piotr
COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 117 - 122
[26] An improved genetic k-means algorithm for optimal clustering
Guo, Hai-Xiang
Zhu, Ke-Jun
Gao, Si-Wei
Liu, Ting
ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 793 - +
[27] Optimal Differentially Private Algorithms for k-Means Clustering
Huang, Zhiyi
Liu, Jinyan
PODS'18: PROCEEDINGS OF THE 37TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2018, : 395 - 408
[28] Determination of the Optimal Number of Clusters in K-Means Algorithm
He, Xuansen
He, Fan
Xu, Li
Fan, Yueping
Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2022, 51 (06): : 904 - 912
[29] An Enhanced K-Means Genetic Algorithms for Optimal Clustering
Anusha, M.
Sathiaseelan, J. G. R.
2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 580 - 584
[30] Near-optimal blacklisting
Dimitrakakis, Christos
Mitrokotsa, Aikaterini
COMPUTERS & SECURITY, 2017, 64 : 110 - 121

← 1 2 3 4 5 →