Near-Optimal Explainable k-Means for All Dimensions

被引:0
|
作者
Charikar, Moses [1 ]
Hu, Lunjia [1 ]
机构
[1] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA
关键词
LOCAL SEARCH YIELDS; BLACK-BOX;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many clustering algorithms are guided by certain cost functions such as the widely-used k-means cost. These algorithms divide data points into clusters with often complicated boundaries, creating difficulties in explaining the clustering decision. In a recent work, Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020) introduced explainable clustering, where the cluster boundaries are axis-parallel hyperplanes and the clustering is obtained by applying a decision tree to the data. The central question here is: how much does the explainability constraint increase the value of the cost function? Given d-dimensional data points, we show an efficient algorithm that finds an explainable clustering whose k-means cost is at most k(1-2/d) poly(d log k) times the minimum cost achievable by a clustering without the explainability constraint, assuming k; d >= 2. Taking the minimum of this bound and the k polylog(k) bound in independent work by Makarychev-Shan (ICML 2021), Gamlath-Jia-Polak-Svensson (2021), or Esfandiari-Mirrokni-Narayanan (2021), we get an improved bound of k 1 polylog(k), which we show is optimal for every choice of k; d >= 2 up to a poly-logarithmic factor in k. For d = 2 in particular, we show an O (log k log log k) bound, improving near-exponentially over the previous best bound of O(k log k) by Laber and Murtinho (ICML 2021).
引用
收藏
页码:2580 / 2606
页数:27
相关论文
共 50 条
  • [1] Near-optimal Algorithms for Explainable k-Medians and k-Means
    Makarychev, Konstantin
    Shan, Liren
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [2] A Near-Optimal Centroids Initialization in K-means Algorithm Using Bees Algorithm
    Mahmuddin, M.
    Yusof, Y.
    COMPUTING & INFORMATICS, 2009, : 172 - 175
  • [3] Locally Private k-Means Clustering with Constant Multiplicative Approximation and Near-Optimal Additive Error
    Chaturvedi, Anamay
    Jones, Matthew
    Huy Le Nguyen
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6167 - 6174
  • [4] A NEAR-OPTIMAL INITIAL SEED VALUE SELECTION IN K-MEANS ALGORITHM USING A GENETIC ALGORITHM
    BABU, GP
    MURTY, MN
    PATTERN RECOGNITION LETTERS, 1993, 14 (10) : 763 - 769
  • [5] Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering
    Kazakovtsev, Lev
    Rozhnov, Ivan
    Popov, Aleksey
    Tovbis, Elena
    COMPUTATION, 2020, 8 (04) : 1 - 32
  • [6] K-Means Clustering with Distributed Dimensions
    Ding, Hu
    Liu, Yu
    Huang, Lingxiao
    Li, Jian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [7] Shallow decision trees for explainable k-means clustering
    Laber, Eduardo
    Murtinho, Lucas
    Oliveira, Felipe
    PATTERN RECOGNITION, 2023, 137
  • [8] Explainable Customer Segmentation Using K-means Clustering
    Khan, Riyo Hayat
    Dofadar, Dibyo Fabian
    Alam, Md Golam Rabiul
    2021 IEEE 12TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2021, : 639 - 643
  • [9] Towards an Optimal Subspace for K-Means
    Mautz, Dominik
    Ye, Wei
    Plant, Claudia
    Boehm, Christian
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 365 - 373
  • [10] Determining an Optimal Value of K in K-means Clustering
    Mehar, Arshad Muhammad
    Matawie, Kenan
    Maeder, Anthony
    2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,