Refined Lower Bounds for Nearest Neighbor Condensation

被引:0
|
作者
Chitnis, Rajesh [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham, W Midlands, England
关键词
nearest neighbor condensation; parameterized complexity; exponential time hypothesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the most commonly used classification techniques is the nearest neighbor rule: given a training set T of labeled points in a metric space (X, rho), a new unlabeled point x is an element of chi is assigned the label of its nearest neighbor in T. To improve both the space & time complexity of this classification, it is desirable to reduce the size of the training set without compromising too much on the accuracy of the classification. Hart (1968) formalized this as the NEAREST NEIGHBOR CONDENSATION (NNC) problem: find a subset C subset of T of minimum size which is consistent with T, i.e., each point t is an element of T has the same label as that of its nearest neighbor in C. This problem is known to be NP-hard (Wilfong, 1991), and the heuristics used in practice often have weak or no theoretical guarantees. We analyze this problem via the refined lens of parameterized complexity, and obtain strong lower bounds for the k-NNC-(Z(d), l(p)) problem which asks if there is a consistent subset of size <= k for a given training set of size n in the metric space (Z(d), l(p)) for any 1 <= p <= infinity: The k-NNC-(Z(d), l(p)) problem is W[1]-hard parameterized by k + d, i.e., unless FPT = W[1], there is no f(k, d) center dot n(O(1)) time algorithm for any computable function f. Under the Exponential Time Hypothesis (ETH), there is no d >= 2 and computable function f such that the k-NNC-(Z(d), l(p)) problem can be solved in f(k, d) center dot n(o(k1-1/d)) time. The second lower bound shows that there is a so-called (Marx and Sidiropoulos, 2014) "limited blessing of low-dimensionality": for small d some improvement might be possible over the brute-force n(O(k)) time algorithm, but as d becomes large the brute-force algorithm becomes asymptotically optimal. It also shows that the is the n(O(root k)) time algorithm of Biniaz et al. (2019) for k-NNC-(R-2, l(2)) is asymptotically tight. Our lower bounds on the fine-grained complexity of NEAREST NEIGHBOR CONDENSATION in a sense justify the use of heuristics in practice, even though they have weak or no theoretical guarantees.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Tighter lower bounds for nearest neighbor search and related problems in the cell probe model
    Barkol, O
    Rabani, Y
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2002, 64 (04) : 873 - 896
  • [2] NEW ERROR BOUNDS WITH THE NEAREST NEIGHBOR RULE
    DEVIJVER, PA
    IEEE TRANSACTIONS ON INFORMATION THEORY, 1979, 25 (06) : 749 - 753
  • [3] Guarantees on nearest-neighbor condensation heuristics
    Flores-Velazco, Alejandro
    Mount, David
    COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS, 2021, 95
  • [4] Exploiting lower bounds to accelerate approximate nearest neighbor search on high-dimensional data
    Liu, Yingfan
    Wei, Hao
    Cheng, Hong
    INFORMATION SCIENCES, 2018, 465 : 484 - 504
  • [5] Efficient distributed data condensation for nearest neighbor classification
    Angiulli, Fabrizio
    Folino, Gianluigi
    EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS, 2007, 4641 : 338 - +
  • [6] Setting lower bounds on Jensen-Shannon divergence and its application to nearest neighbor document search
    Dobrynin, V. Yu
    Rooney, N.
    Serdyuk, J. A.
    VESTNIK SANKT-PETERBURGSKOGO UNIVERSITETA SERIYA 10 PRIKLADNAYA MATEMATIKA INFORMATIKA PROTSESSY UPRAVLENIYA, 2018, 14 (04): : 334 - 345
  • [7] Refined Lower Bounds for Adversarial Bandits
    Gerchinovitz, Sebastien
    Lattimore, Tor
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [8] Nearest Neighbor Condensation Based on Fuzzy Rough Set for Classification
    Pan, Wei
    She, Kun
    Wei, Pengyuan
    Zeng, Kai
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, RSKT 2014, 2014, 8818 : 432 - 443
  • [9] Fast nearest neighbor condensation for large data sets classification
    Angiulli, Fabrizio
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (11) : 1450 - 1464
  • [10] A strong lower bound for approximate nearest neighbor searching
    Liu, D
    INFORMATION PROCESSING LETTERS, 2004, 92 (01) : 23 - 29