We provide a randomized linear time approximation scheme for a generic problem about clustering of binary vectors subject to additional constraints. The new constrained clustering problem generalizes a number of problems and by solving it, we obtain the first linear time-approximation schemes for a number of well-studied fundamental problems concerning clustering of binary vectors and low-rank approximation of binary matrices. Among the problems solvable by our approach are Low GF(2)-RANK APPROXIMATION, Low BOOLEAN-RANK APPROXIMATION, and various versions of BINARY CLUSTERING. For example, for Low GF(2)-RANK APPROXIMATION problem, where for an m x n binary matrix A and integer r > 0, we seek for a binary matrix B of GF(2) rank at most r such that the l(0)-norm of matrix A - B is minimum, our algorithm, for any epsilon > 0 in time f (r, epsilon) . n . m, where f is some computable function, outputs a (1 + epsilon)-approximate solution with probability at least (1 - 1/e). This is the first linear time approximation scheme for these problems. We -7 also give (deterministic) PTASes for these problems running in time n(f(r)()1/)(epsilon 2)( log 1/epsilon), where f is some function depending on the problem. Our algorithm for the constrained clustering problem is based on a novel sampling lemma, which is interesting on its own.
机构:
Department of Control Science and Engineering,Harbin Institute of TechnologyDepartment of Control Science and Engineering,Harbin Institute of Technology
刘婷
论文数: 引用数:
h-index:
机构:
孙明健
论文数: 引用数:
h-index:
机构:
冯乃章
论文数: 引用数:
h-index:
机构:
王明华
陈德应
论文数: 0引用数: 0
h-index: 0
机构:
Department of Control Science and Engineering,Harbin Institute of TechnologyDepartment of Control Science and Engineering,Harbin Institute of Technology
陈德应
沈毅
论文数: 0引用数: 0
h-index: 0
机构:
Department of Control Science and Engineering,Harbin Institute of TechnologyDepartment of Control Science and Engineering,Harbin Institute of Technology