Near-Optimal Explainable k-Means for All Dimensions

被引:0
|
作者
Charikar, Moses [1 ]
Hu, Lunjia [1 ]
机构
[1] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA
关键词
LOCAL SEARCH YIELDS; BLACK-BOX;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many clustering algorithms are guided by certain cost functions such as the widely-used k-means cost. These algorithms divide data points into clusters with often complicated boundaries, creating difficulties in explaining the clustering decision. In a recent work, Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020) introduced explainable clustering, where the cluster boundaries are axis-parallel hyperplanes and the clustering is obtained by applying a decision tree to the data. The central question here is: how much does the explainability constraint increase the value of the cost function? Given d-dimensional data points, we show an efficient algorithm that finds an explainable clustering whose k-means cost is at most k(1-2/d) poly(d log k) times the minimum cost achievable by a clustering without the explainability constraint, assuming k; d >= 2. Taking the minimum of this bound and the k polylog(k) bound in independent work by Makarychev-Shan (ICML 2021), Gamlath-Jia-Polak-Svensson (2021), or Esfandiari-Mirrokni-Narayanan (2021), we get an improved bound of k 1 polylog(k), which we show is optimal for every choice of k; d >= 2 up to a poly-logarithmic factor in k. For d = 2 in particular, we show an O (log k log log k) bound, improving near-exponentially over the previous best bound of O(k log k) by Laber and Murtinho (ICML 2021).
引用
收藏
页码:2580 / 2606
页数:27
相关论文
共 50 条
  • [41] K-means tree: an optimal clustering tree for unsupervised learning
    Pooya Tavallali
    Peyman Tavallali
    Mukesh Singhal
    The Journal of Supercomputing, 2021, 77 : 5239 - 5266
  • [42] An iterative algorithm for optimal variable weighting in K-means clustering
    Zhang, Shaonan
    Li, Shanshan
    Hu, Jiaqiao
    Xing, Haipeng
    Zhu, Wei
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2019, 48 (05) : 1346 - 1365
  • [43] K-means algorithm in the optimal initial centroids based on dissimilarity
    Shunye, Wang
    Yeqin, Cui
    Zuotao, Jin
    Xinyuan, Liu
    Journal of Chemical and Pharmaceutical Research, 2013, 5 (12) : 745 - 749
  • [44] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
    Mao, Yingchi
    Xu, Ziyang
    Li, Xiaofang
    Ping, Ping
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 3149 - 3156
  • [45] Randomized Sketches for Clustering: Fast and Optimal Kernel k-Means
    Yin, Rong
    Liu, Yong
    Wang, Weiping
    Meng, Dan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [46] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
    Mao, Yingchi
    Xu, Ziyang
    Ping, Ping
    Wang, Longbao
    2015 NINTH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY FCST 2015, 2015, : 386 - 391
  • [47] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [48] Deep k-Means: Jointly clustering with k-Means and learning representations
    Fard, Maziar Moradi
    Thonet, Thibaut
    Gaussier, Eric
    PATTERN RECOGNITION LETTERS, 2020, 138 : 185 - 192
  • [49] K and starting means for k-means algorithm
    Fahim, Ahmed
    JOURNAL OF COMPUTATIONAL SCIENCE, 2021, 55
  • [50] Learning the k in k-means
    Hamerly, G
    Elkan, C
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 281 - 288