Optimizing the Number of Clusters for Billion-Scale Quantization-Based Nearest Neighbor Search

被引:0
|
作者
Fu, Yujian [1 ]
Chen, Cheng [2 ]
Chen, Xiaohui [2 ]
Wong, Weng-Fai [1 ]
He, Bingsheng [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 119077, Singapore
[2] ByteDance Inc, Beijing 100086, Peoples R China
基金
新加坡国家研究基金会;
关键词
Indexes; Vectors; Costs; Clustering algorithms; Optimization; Databases; Task analysis; Inverted index; billion-scale approximate nearest neighbor search; parameter optimization; vector quantization; PRODUCT QUANTIZATION; APPROXIMATE;
D O I
10.1109/TKDE.2024.3408815
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Approximate nearest neighbor search (ANNS) is crucial in various real-world applications, including recommendation systems, data mining, and image retrieval. To date, quantization-based algorithms have emerged as one of the most efficient solutions for ANNS on billion-scale datasets. However, the determination of the optimal number of clusters, a critical factor for peak data performance in quantization-based systems, remains inadequately explored. Previous works often propose numbers of clusters that are not optimal, and the absence of effective methodologies for tuning this parameter leads to suboptimal search performance due to the vast configuration space. In response to this challenge, this paper introduces a novel algorithm that automatically identifies the optimal number of clusters for billion-scale, quantization-based ANNS systems to maximize search efficiency. We propose an analytical model for evaluating retrieval performance, serving as the benchmark for optimizing cluster numbers in quantization-based indexes. Our algorithm applies iterative local adjustments to the ANNS index being constructed, progressively refining the number of clusters. We demonstrate the efficacy of our approach using the popular inverted index structure in quantization-based ANNS systems. Our findings indicate that: (1) By optimizing the number of clusters, the vanilla inverted index exhibits improved retrieval performance on billion-scale datasets when compared to existing state-of-the-art quantization-based methods; and (2) The additional computational overhead introduced by our optimization algorithm is minimal, even when applied to billion-scale datasets.
引用
收藏
页码:6786 / 6800
页数:15
相关论文
共 50 条
  • [1] An efficient indexing technique for billion-scale nearest neighbor search
    Yang, Kaixiang
    Wang, Hongya
    Du, Ming
    Wang, Zhizheng
    Tan, Zongyuan
    Zhang, Jie
    Xiao, Yingyuan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (20) : 31673 - 31689
  • [2] An efficient indexing technique for billion-scale nearest neighbor search
    Kaixiang Yang
    Hongya Wang
    Ming Du
    Zhizheng Wang
    Zongyuan Tan
    Jie Zhang
    Yingyuan Xiao
    Multimedia Tools and Applications, 2023, 82 : 31673 - 31689
  • [3] SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search
    Chen, Qi
    Zhao, Bing
    Wang, Haidong
    Li, Mingqin
    Liu, Chuanjie
    Li, Zengzhong
    Yang, Mao
    Wang, Jingdong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [4] Effective product quantization-based indexing for nearest neighbor search
    Chiu, Chih-Yi
    Chiu, Jih-Sheng
    Markchit, Sarawut
    Chou, Sheng-Hao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 2877 - 2895
  • [5] Effective product quantization-based indexing for nearest neighbor search
    Chih-Yi Chiu
    Jih-Sheng Chiu
    Sarawut Markchit
    Sheng-Hao Chou
    Multimedia Tools and Applications, 2019, 78 : 2877 - 2895
  • [6] Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search
    Simhadri, Harsha Vardhan
    Williams, George
    Aumueller, Martin
    Douze, Matthijs
    Babenko, Artem
    Baranchuk, Dmitry
    Chen, Qi
    Hosseini, Lucas
    Krishnaswamy, Ravishankar
    Srinivasa, Gopal
    Subramanya, Suhas Jayaram
    Wang, Jingdong
    NEURIPS 2021 COMPETITIONS AND DEMONSTRATIONS TRACK, VOL 176, 2021, 176 : 177 - 189
  • [7] Processing-In-Hierarchical-Memory Architecture for Billion-Scale Approximate Nearest Neighbor Search
    Zhu, Zhenhua
    Liu, Jun
    Dai, Guohao
    Zeng, Shulin
    Li, Bing
    Yang, Huazhong
    Wang, Yu
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [8] Nearest Neighbor Search Based on Product Quantization in Clusters
    Liu S.-W.
    Chen W.
    Zhao W.
    Chen J.-C.
    Lu P.
    Jisuanji Xuebao/Chinese Journal of Computers, 2020, 43 (02): : 303 - 314
  • [9] Quantization-Based Approximate Nearest Neighbor Search with Optimized Multiple Residual Codebooks
    Uchida, Yusuke
    Takagi, Koichi
    Kawada, Ryoichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (07): : 1510 - 1514
  • [10] Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor Search
    Jang, Junhyeok
    Choi, Hanjin
    Bae, Hanyeoreum
    Lee, Seungjun
    Kwon, Miryeong
    Jung, Myoungsoo
    ACM TRANSACTIONS ON STORAGE, 2024, 20 (02)