An efficient indexing technique for billion-scale nearest neighbor search

被引:0
|
作者
Kaixiang Yang
Hongya Wang
Ming Du
Zhizheng Wang
Zongyuan Tan
Jie Zhang
Yingyuan Xiao
机构
[1] Donghua University,School of Computer Science and Technology
[2] State Key Laboratory of Computer Architecture,Institute of Artificial Intelligence
[3] ICT,School of CSE
[4] CAS,undefined
[5] Shanghai Key Laboratory of Computer Software Evaluating and Testing,undefined
[6] Donghua University,undefined
[7] Tianjin University of Technology,undefined
来源
关键词
Approximate nearest neighbor search; Hierarchical navigable small world graph; Product quantization; Re-rank;
D O I
暂无
中图分类号
学科分类号
摘要
Approximate nearest neighbor search is an indispensable component in many computer vision applications. To index more data, such as images, on one commercial server, Douze et al. introduced L&C that works on operating points considering 64–128 bytes per vector. While the idea is inspiring, we observe that L&C still suffers the accuracy saturation problem, which it is aimed to solve. To this end, we propose a simple yet effective two-layer graph index structure, together with dual residual encoding, to attain higher accuracy. Particularly, we partition vectors into multiple clusters and build the top-layer graph using the corresponding centroids. For each cluster, a subgraph is created with compact codes of the first-level vector residuals. Such an index structure provides better graph search precision as well as saves quite a few bytes for compression. We employ the second-level residual quantization to re-rank the candidates obtained through graph traversal, which is more efficient than regression-from-neighbors adopted by L&C. Comprehensive experiments show that our proposal obtains over 10% and 30% higher recall@1 than the state-of-the-arts, and achieves up to 7.7x and 6.1x speedup over L&C on Deep1B and Sift1B, respectively. Our proposal also attains 90%+ recall@10 and recall@100 on two billion-sized datasets at the cost of 10ms per query.
引用
收藏
页码:31673 / 31689
页数:16
相关论文
共 50 条
  • [1] An efficient indexing technique for billion-scale nearest neighbor search
    Yang, Kaixiang
    Wang, Hongya
    Du, Ming
    Wang, Zhizheng
    Tan, Zongyuan
    Zhang, Jie
    Xiao, Yingyuan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (20) : 31673 - 31689
  • [2] SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search
    Chen, Qi
    Zhao, Bing
    Wang, Haidong
    Li, Mingqin
    Liu, Chuanjie
    Li, Zengzhong
    Yang, Mao
    Wang, Jingdong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [3] Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search
    Simhadri, Harsha Vardhan
    Williams, George
    Aumueller, Martin
    Douze, Matthijs
    Babenko, Artem
    Baranchuk, Dmitry
    Chen, Qi
    Hosseini, Lucas
    Krishnaswamy, Ravishankar
    Srinivasa, Gopal
    Subramanya, Suhas Jayaram
    Wang, Jingdong
    NEURIPS 2021 COMPETITIONS AND DEMONSTRATIONS TRACK, VOL 176, 2021, 176 : 177 - 189
  • [4] Processing-In-Hierarchical-Memory Architecture for Billion-Scale Approximate Nearest Neighbor Search
    Zhu, Zhenhua
    Liu, Jun
    Dai, Guohao
    Zeng, Shulin
    Li, Bing
    Yang, Huazhong
    Wang, Yu
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [5] Optimizing the Number of Clusters for Billion-Scale Quantization-Based Nearest Neighbor Search
    Fu, Yujian
    Chen, Cheng
    Chen, Xiaohui
    Wong, Weng-Fai
    He, Bingsheng
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 6786 - 6800
  • [6] Efficient Indexing of Billion-Scale datasets of deep descriptors
    Babenko, Artem
    Lempitsky, Victor
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2055 - 2063
  • [7] Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor Search
    Jang, Junhyeok
    Choi, Hanjin
    Bae, Hanyeoreum
    Lee, Seungjun
    Kwon, Miryeong
    Jung, Myoungsoo
    ACM TRANSACTIONS ON STORAGE, 2024, 20 (02)
  • [8] IMI-GPU: Inverted multi-index for billion-scale approximate nearest neighbor search with GPUs
    Araujo, Alan
    Barreiros Jr, Willian
    Kong, Jun
    Ferreira, Renato
    Teodoro, George
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2025, 200
  • [9] Billion-Scale Similarity Search Using a Hybrid Indexing Approach with Advanced Filtering
    Emanuilov, Simeon
    Dimov, Aleksandar
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2024, 24 (04) : 45 - 58
  • [10] Efficient Exact K-Nearest Neighbor Graph Construction for Billion-Scale Datasets using GPUs with Tensor Cores
    Ji, Zhuoran
    Wang, Cho-Li
    PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022, 2022,