Refined Lower Bounds for Nearest Neighbor Condensation

被引：0

作者：

Chitnis, Rajesh ^{[1
]}

机构：

[1] Univ Birmingham, Sch Comp Sci, Birmingham, W Midlands, England

来源：

INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167 | 2022年 / 167卷

关键词：

nearest neighbor condensation; parameterized complexity; exponential time hypothesis;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the most commonly used classification techniques is the nearest neighbor rule: given a training set T of labeled points in a metric space (X, rho), a new unlabeled point x is an element of chi is assigned the label of its nearest neighbor in T. To improve both the space & time complexity of this classification, it is desirable to reduce the size of the training set without compromising too much on the accuracy of the classification. Hart (1968) formalized this as the NEAREST NEIGHBOR CONDENSATION (NNC) problem: find a subset C subset of T of minimum size which is consistent with T, i.e., each point t is an element of T has the same label as that of its nearest neighbor in C. This problem is known to be NP-hard (Wilfong, 1991), and the heuristics used in practice often have weak or no theoretical guarantees. We analyze this problem via the refined lens of parameterized complexity, and obtain strong lower bounds for the k-NNC-(Z(d), l(p)) problem which asks if there is a consistent subset of size <= k for a given training set of size n in the metric space (Z(d), l(p)) for any 1 <= p <= infinity: The k-NNC-(Z(d), l(p)) problem is W[1]-hard parameterized by k + d, i.e., unless FPT = W[1], there is no f(k, d) center dot n(O(1)) time algorithm for any computable function f. Under the Exponential Time Hypothesis (ETH), there is no d >= 2 and computable function f such that the k-NNC-(Z(d), l(p)) problem can be solved in f(k, d) center dot n(o(k1-1/d)) time. The second lower bound shows that there is a so-called (Marx and Sidiropoulos, 2014) "limited blessing of low-dimensionality": for small d some improvement might be possible over the brute-force n(O(k)) time algorithm, but as d becomes large the brute-force algorithm becomes asymptotically optimal. It also shows that the is the n(O(root k)) time algorithm of Biniaz et al. (2019) for k-NNC-(R-2, l(2)) is asymptotically tight. Our lower bounds on the fine-grained complexity of NEAREST NEIGHBOR CONDENSATION in a sense justify the use of heuristics in practice, even though they have weak or no theoretical guarantees.

引用

页数：20

共 50 条

[1] Tighter lower bounds for nearest neighbor search and related problems in the cell probe model
Barkol, O
Rabani, Y
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2002, 64 (04) : 873 - 896
[2] NEW ERROR BOUNDS WITH THE NEAREST NEIGHBOR RULE
DEVIJVER, PA
IEEE TRANSACTIONS ON INFORMATION THEORY, 1979, 25 (06) : 749 - 753
[3] Guarantees on nearest-neighbor condensation heuristics
Flores-Velazco, Alejandro
Mount, David
COMPUTATIONAL GEOMETRY-THEORY AND APPLICATIONS, 2021, 95
[4] Exploiting lower bounds to accelerate approximate nearest neighbor search on high-dimensional data
Liu, Yingfan
Wei, Hao
Cheng, Hong
INFORMATION SCIENCES, 2018, 465 : 484 - 504
[5] Efficient distributed data condensation for nearest neighbor classification
Angiulli, Fabrizio
Folino, Gianluigi
EURO-PAR 2007 PARALLEL PROCESSING, PROCEEDINGS, 2007, 4641 : 338 - +
[6] Setting lower bounds on Jensen-Shannon divergence and its application to nearest neighbor document search
Dobrynin, V. Yu
Rooney, N.
Serdyuk, J. A.
VESTNIK SANKT-PETERBURGSKOGO UNIVERSITETA SERIYA 10 PRIKLADNAYA MATEMATIKA INFORMATIKA PROTSESSY UPRAVLENIYA, 2018, 14 (04): : 334 - 345
[7] Refined Lower Bounds for Adversarial Bandits
Gerchinovitz, Sebastien
Lattimore, Tor
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[8] Nearest Neighbor Condensation Based on Fuzzy Rough Set for Classification
Pan, Wei
She, Kun
Wei, Pengyuan
Zeng, Kai
ROUGH SETS AND KNOWLEDGE TECHNOLOGY, RSKT 2014, 2014, 8818 : 432 - 443
[9] Fast nearest neighbor condensation for large data sets classification
Angiulli, Fabrizio
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (11) : 1450 - 1464
[10] A strong lower bound for approximate nearest neighbor searching
Liu, D
INFORMATION PROCESSING LETTERS, 2004, 92 (01) : 23 - 29

← 1 2 3 4 5 →