Near-Optimal Explainable k-Means for All Dimensions

被引:0
|
作者
Charikar, Moses [1 ]
Hu, Lunjia [1 ]
机构
[1] Stanford Univ, Comp Sci Dept, Stanford, CA 94305 USA
关键词
LOCAL SEARCH YIELDS; BLACK-BOX;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many clustering algorithms are guided by certain cost functions such as the widely-used k-means cost. These algorithms divide data points into clusters with often complicated boundaries, creating difficulties in explaining the clustering decision. In a recent work, Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020) introduced explainable clustering, where the cluster boundaries are axis-parallel hyperplanes and the clustering is obtained by applying a decision tree to the data. The central question here is: how much does the explainability constraint increase the value of the cost function? Given d-dimensional data points, we show an efficient algorithm that finds an explainable clustering whose k-means cost is at most k(1-2/d) poly(d log k) times the minimum cost achievable by a clustering without the explainability constraint, assuming k; d >= 2. Taking the minimum of this bound and the k polylog(k) bound in independent work by Makarychev-Shan (ICML 2021), Gamlath-Jia-Polak-Svensson (2021), or Esfandiari-Mirrokni-Narayanan (2021), we get an improved bound of k 1 polylog(k), which we show is optimal for every choice of k; d >= 2 up to a poly-logarithmic factor in k. For d = 2 in particular, we show an O (log k log log k) bound, improving near-exponentially over the previous best bound of O(k log k) by Laber and Murtinho (ICML 2021).
引用
收藏
页码:2580 / 2606
页数:27
相关论文
共 50 条
  • [21] Near-optimal clustering in the k-machine model
    Bandyapadhyay, Sayan
    Inamdar, Tanmay
    Pai, Shreyas
    Pemmaraju, Sriram, V
    THEORETICAL COMPUTER SCIENCE, 2022, 899 : 80 - 97
  • [22] Near-Optimal Approximate Decremental All Pairs Shortest Paths
    Chechik, Shiri
    2018 IEEE 59TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2018, : 170 - 181
  • [23] K-means - a fast and efficient K-means algorithms
    Nguyen C.D.
    Duong T.H.
    Nguyen, Cuong Duc (nguyenduccuong@tdt.edu.vn), 2018, Inderscience Publishers, 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (11) : 27 - 45
  • [24] Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
    Andoni, Alexandr
    Indyk, Piotr
    47TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2006, : 459 - +
  • [25] Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
    Andoni, Alexandr
    Indyk, Piotr
    COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 117 - 122
  • [26] An improved genetic k-means algorithm for optimal clustering
    Guo, Hai-Xiang
    Zhu, Ke-Jun
    Gao, Si-Wei
    Liu, Ting
    ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 793 - +
  • [27] Optimal Differentially Private Algorithms for k-Means Clustering
    Huang, Zhiyi
    Liu, Jinyan
    PODS'18: PROCEEDINGS OF THE 37TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2018, : 395 - 408
  • [28] Determination of the Optimal Number of Clusters in K-Means Algorithm
    He, Xuansen
    He, Fan
    Xu, Li
    Fan, Yueping
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2022, 51 (06): : 904 - 912
  • [29] An Enhanced K-Means Genetic Algorithms for Optimal Clustering
    Anusha, M.
    Sathiaseelan, J. G. R.
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 580 - 584
  • [30] Near-optimal blacklisting
    Dimitrakakis, Christos
    Mitrokotsa, Aikaterini
    COMPUTERS & SECURITY, 2017, 64 : 110 - 121