Near-Optimal Explainable k-Means for All Dimensions

被引：0

作者：

Charikar, Moses ^{[1
]}

Hu, Lunjia ^{[1
]}

机构：

[1] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA

来源：

PROCEEDINGS OF THE 2022 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA | 2022年

关键词：

LOCAL SEARCH YIELDS; BLACK-BOX;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Many clustering algorithms are guided by certain cost functions such as the widely-used k-means cost. These algorithms divide data points into clusters with often complicated boundaries, creating difficulties in explaining the clustering decision. In a recent work, Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020) introduced explainable clustering, where the cluster boundaries are axis-parallel hyperplanes and the clustering is obtained by applying a decision tree to the data. The central question here is: how much does the explainability constraint increase the value of the cost function? Given d-dimensional data points, we show an efficient algorithm that finds an explainable clustering whose k-means cost is at most k(1-2/d) poly(d log k) times the minimum cost achievable by a clustering without the explainability constraint, assuming k; d >= 2. Taking the minimum of this bound and the k polylog(k) bound in independent work by Makarychev-Shan (ICML 2021), Gamlath-Jia-Polak-Svensson (2021), or Esfandiari-Mirrokni-Narayanan (2021), we get an improved bound of k 1 polylog(k), which we show is optimal for every choice of k; d >= 2 up to a poly-logarithmic factor in k. For d = 2 in particular, we show an O (log k log log k) bound, improving near-exponentially over the previous best bound of O(k log k) by Laber and Murtinho (ICML 2021).

引用

页码：2580 / 2606

页数：27

共 50 条

[41] K-means tree: an optimal clustering tree for unsupervised learning
Pooya Tavallali
Peyman Tavallali
Mukesh Singhal
The Journal of Supercomputing, 2021, 77 : 5239 - 5266
[42] An iterative algorithm for optimal variable weighting in K-means clustering
Zhang, Shaonan
Li, Shanshan
Hu, Jiaqiao
Xing, Haipeng
Zhu, Wei
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2019, 48 (05) : 1346 - 1365
[43] K-means algorithm in the optimal initial centroids based on dissimilarity
Shunye, Wang
Yeqin, Cui
Zuotao, Jin
Xinyuan, Liu
Journal of Chemical and Pharmaceutical Research, 2013, 5 (12) : 745 - 749
[44] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
Mao, Yingchi
Xu, Ziyang
Li, Xiaofang
Ping, Ping
2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 3149 - 3156
[45] Randomized Sketches for Clustering: Fast and Optimal Kernel k-Means
Yin, Rong
Liu, Yong
Wang, Weiping
Meng, Dan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[46] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
Mao, Yingchi
Xu, Ziyang
Ping, Ping
Wang, Longbao
2015 NINTH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY FCST 2015, 2015, : 386 - 391
[47] Clustering of Image Data Using K-Means and Fuzzy K-Means
Rahmani, Md. Khalid Imam
Pal, Naina
Arora, Kamiya
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
[48] Deep k-Means: Jointly clustering with k-Means and learning representations
Fard, Maziar Moradi
Thonet, Thibaut
Gaussier, Eric
PATTERN RECOGNITION LETTERS, 2020, 138 : 185 - 192
[49] K and starting means for k-means algorithm
Fahim, Ahmed
JOURNAL OF COMPUTATIONAL SCIENCE, 2021, 55
[50] Learning the k in k-means
Hamerly, G
Elkan, C
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 281 - 288

← 1 2 3 4 5 →