An efficient indexing technique for billion-scale nearest neighbor search

被引:0
|
作者
Kaixiang Yang
Hongya Wang
Ming Du
Zhizheng Wang
Zongyuan Tan
Jie Zhang
Yingyuan Xiao
机构
[1] Donghua University,School of Computer Science and Technology
[2] State Key Laboratory of Computer Architecture,Institute of Artificial Intelligence
[3] ICT,School of CSE
[4] CAS,undefined
[5] Shanghai Key Laboratory of Computer Software Evaluating and Testing,undefined
[6] Donghua University,undefined
[7] Tianjin University of Technology,undefined
来源
Multimedia Tools and Applications | 2023年 / 82卷
关键词
Approximate nearest neighbor search; Hierarchical navigable small world graph; Product quantization; Re-rank;
D O I
暂无
中图分类号
学科分类号
摘要
Approximate nearest neighbor search is an indispensable component in many computer vision applications. To index more data, such as images, on one commercial server, Douze et al. introduced L&C that works on operating points considering 64–128 bytes per vector. While the idea is inspiring, we observe that L&C still suffers the accuracy saturation problem, which it is aimed to solve. To this end, we propose a simple yet effective two-layer graph index structure, together with dual residual encoding, to attain higher accuracy. Particularly, we partition vectors into multiple clusters and build the top-layer graph using the corresponding centroids. For each cluster, a subgraph is created with compact codes of the first-level vector residuals. Such an index structure provides better graph search precision as well as saves quite a few bytes for compression. We employ the second-level residual quantization to re-rank the candidates obtained through graph traversal, which is more efficient than regression-from-neighbors adopted by L&C. Comprehensive experiments show that our proposal obtains over 10% and 30% higher recall@1 than the state-of-the-arts, and achieves up to 7.7x and 6.1x speedup over L&C on Deep1B and Sift1B, respectively. Our proposal also attains 90%+ recall@10 and recall@100 on two billion-sized datasets at the cost of 10ms per query.
引用
收藏
页码:31673 / 31689
页数:16
相关论文
共 50 条
  • [41] Effective product quantization-based indexing for nearest neighbor search
    Chih-Yi Chiu
    Jih-Sheng Chiu
    Sarawut Markchit
    Sheng-Hao Chou
    Multimedia Tools and Applications, 2019, 78 : 2877 - 2895
  • [42] A review of feature indexing methods for fast approximate nearest neighbor search
    The-Anh Pham
    Van-Hao Le
    Dinh-Nghiep Le
    PROCEEDINGS OF 2018 5TH NAFOSTED CONFERENCE ON INFORMATION AND COMPUTER SCIENCE (NICS 2018), 2018, : 372 - 377
  • [43] TOWARDS BILLION-SCALE SOCIAL SIMULATIONS
    Suzumura, Toyotaro
    Houngkaew, Charuwat
    Kanezashi, Hiroki
    PROCEEDINGS OF THE 2014 WINTER SIMULATION CONFERENCE (WSC), 2014, : 781 - 792
  • [44] Solving Billion-Scale Knapsack Problems
    Zhang, Xingwen
    Qi, Feng
    Hua, Zhigang
    Yang, Shuang
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 3105 - 3111
  • [45] The N-tree: an indexing technique for nearest-neighbor queries
    Najjar, Faiza
    Slimani, Hassenet
    2006 IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2006, : 460 - +
  • [46] Scalable Billion-point Approximate Nearest Neighbor Search Using SmartSSDs
    Tian, Bing
    Liu, Haikun
    Duan, Zhuohui
    Liao, Xiaofei
    Jin, Hai
    Zhang, Yu
    PROCEEDINGS OF THE 2024 USENIX ANNUAL TECHNICAL CONFERENCE, ATC 2024, 2024, : 1135 - 1150
  • [47] An Efficient Exact Nearest Neighbor Search by Compounded Embedding
    Li, Mingjie
    Zhang, Ying
    Sun, Yifang
    Wang, Wei
    Tsang, Ivor W.
    Lin, Xuemin
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2018, PT I, 2018, 10827 : 37 - 54
  • [48] Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search
    Jaasaari, Elias
    Hyvonen, Ville
    Roos, Teemu
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT II, 2019, 11440 : 590 - 602
  • [49] Billion-scale Detection of Isomorphic Nodes
    Cappelletti, Luca
    Fontana, Tommaso
    Reese, Justin
    Bader, David A.
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 230 - 233
  • [50] GraphWeaver: Billion-Scale Cybersecurity Incident Correlation
    Freitas, Scott
    Gharib, Amir
    PROCEEDINGS OF THE 33RD ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2024, 2024, : 4479 - 4486