Near-Optimal Explainable k-Means for All Dimensions

被引：0

作者：

Charikar, Moses ^{[1
]}

Hu, Lunjia ^{[1
]}

机构：

[1] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA

来源：

PROCEEDINGS OF THE 2022 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA | 2022年

关键词：

LOCAL SEARCH YIELDS; BLACK-BOX;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Many clustering algorithms are guided by certain cost functions such as the widely-used k-means cost. These algorithms divide data points into clusters with often complicated boundaries, creating difficulties in explaining the clustering decision. In a recent work, Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020) introduced explainable clustering, where the cluster boundaries are axis-parallel hyperplanes and the clustering is obtained by applying a decision tree to the data. The central question here is: how much does the explainability constraint increase the value of the cost function? Given d-dimensional data points, we show an efficient algorithm that finds an explainable clustering whose k-means cost is at most k(1-2/d) poly(d log k) times the minimum cost achievable by a clustering without the explainability constraint, assuming k; d >= 2. Taking the minimum of this bound and the k polylog(k) bound in independent work by Makarychev-Shan (ICML 2021), Gamlath-Jia-Polak-Svensson (2021), or Esfandiari-Mirrokni-Narayanan (2021), we get an improved bound of k 1 polylog(k), which we show is optimal for every choice of k; d >= 2 up to a poly-logarithmic factor in k. For d = 2 in particular, we show an O (log k log log k) bound, improving near-exponentially over the previous best bound of O(k log k) by Laber and Murtinho (ICML 2021).

引用

页码：2580 / 2606

页数：27

共 50 条

[1] Near-optimal Algorithms for Explainable k-Medians and k-Means
Makarychev, Konstantin
Shan, Liren
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[2] A Near-Optimal Centroids Initialization in K-means Algorithm Using Bees Algorithm
Mahmuddin, M.
Yusof, Y.
COMPUTING & INFORMATICS, 2009, : 172 - 175
[3] Locally Private k-Means Clustering with Constant Multiplicative Approximation and Near-Optimal Additive Error
Chaturvedi, Anamay
Jones, Matthew
Huy Le Nguyen
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6167 - 6174
[4] A NEAR-OPTIMAL INITIAL SEED VALUE SELECTION IN K-MEANS ALGORITHM USING A GENETIC ALGORITHM
BABU, GP
MURTY, MN
PATTERN RECOGNITION LETTERS, 1993, 14 (10) : 763 - 769
[5] Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering
Kazakovtsev, Lev
Rozhnov, Ivan
Popov, Aleksey
Tovbis, Elena
COMPUTATION, 2020, 8 (04) : 1 - 32
[6] K-Means Clustering with Distributed Dimensions
Ding, Hu
Liu, Yu
Huang, Lingxiao
Li, Jian
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[7] Shallow decision trees for explainable k-means clustering
Laber, Eduardo
Murtinho, Lucas
Oliveira, Felipe
PATTERN RECOGNITION, 2023, 137
[8] Explainable Customer Segmentation Using K-means Clustering
Khan, Riyo Hayat
Dofadar, Dibyo Fabian
Alam, Md Golam Rabiul
2021 IEEE 12TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2021, : 639 - 643
[9] Towards an Optimal Subspace for K-Means
Mautz, Dominik
Ye, Wei
Plant, Claudia
Boehm, Christian
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 365 - 373
[10] Determining an Optimal Value of K in K-means Clustering
Mehar, Arshad Muhammad
Matawie, Kenan
Maeder, Anthony
2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,

← 1 2 3 4 5 →