LSH-based distributed similarity indexing with load balancing in high-dimensional space

被引:10
|
作者
Wu, Jiagao [1 ,2 ]
Shen, Lu [1 ,2 ]
Liu, Linfeng [1 ,2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp, POB 843, Nanjing 210023, Peoples R China
[2] Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2020年 / 76卷 / 01期
基金
中国国家自然科学基金;
关键词
Locality-sensitive hashing; Similarity search; P2P networks; Load balancing; High-dimensional space; EFFICIENT; SEARCH;
D O I
10.1007/s11227-019-03047-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Locality-sensitive hashing (LSH) and its variants are well-known indexing schemes for solving the similarity search problem in high-dimensional space. Traditionally, these indexing schemes are centrally managed and multiple hash tables are needed to guarantee the search quality. However, due to the limitation of storage space and processing capacity of the server, the centralized indexing schemes become impractical for massive data objects. Therefore, several distributed indexing schemes based on peer-to-peer (P2P) networks are proposed, whereas how to ensure load balancing is still one of the key issues. To solve the problem, in this paper, we propose two theoretical LSH-based data distribution models in P2P networks for datasets with homogeneous and heterogeneous l2\documentclass[12pt]{minimal}earlier schemes, to our knowledge, we focus on load balancing for a single hash table rather than multiple tables, which has not been considered previously. Then, we propose a static distributed indexing scheme with a novel load balancing indexing mapping method based on the cumulative distribution function by our models. Furthermore, we propose a dynamic load rebalancing algorithm using virtual node method of P2P networks to make the static indexing scheme more practical and robust. The experiments based on synthetic and real datasets show that the proposed distributed similarity indexing schemes are effective and efficient for load balancing in similarity indexing of high-dimensional space.
引用
收藏
页码:636 / 665
页数:30
相关论文
共 50 条
  • [21] Efficient indexing of binary LSH for high dimensional nearest neighbor
    Zhang, Xiaoyu
    Wang, Manlin
    Cui, Jiangtao
    NEUROCOMPUTING, 2016, 213 : 24 - 33
  • [22] Distributed Online Similarity Search in High Dimensional Space
    Li, Baohui
    Xu, Kefu
    Xie, Hongtao
    2014 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2014, : 204 - +
  • [23] A Distributed Near-Optimal LSH-based Framework for Privacy-Preserving Record Linkage
    Karapiperis, Dimitrios
    Verykios, Vassilios S.
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2014, 11 (02) : 745 - 763
  • [24] Indexing the solution space: A new technique for nearest neighbor search in high-dimensional space
    Berchtold, S
    Keim, DA
    Kriegel, HP
    Seidl, T
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2000, 12 (01) : 45 - 57
  • [25] Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches
    Wang C.
    Wang X.S.
    The VLDB Journal, 2001, 9 (4) : 344 - 361
  • [26] Scalable high-dimensional indexing with Hadoop
    Shestakov, Denis
    Moise, Diana
    Gudmundsson, Gylfi
    Amsaleg, Laurent
    2013 11TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI 2013), 2013, : 207 - 212
  • [27] High-Dimensional Indexing by Sparse Approximation
    Borges, Pedro
    Mourao, Andre
    Magalhaes, Joao
    ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 163 - 170
  • [28] Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches
    Wang, CZ
    Wang, XS
    VLDB JOURNAL, 2001, 9 (04): : 344 - 361
  • [29] DForest: A Minimal Dimensionality-Aware Indexing for High-Dimensional Exact Similarity Search
    Li, Lingli
    Sun, Wenjing
    Wu, Baohua
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (10) : 5092 - 5105
  • [30] EncSIM: An Encrypted Similarity Search Service for Distributed High-dimensional Datasets
    Liu, Xiaoning
    Yuan, Xingliang
    Wang, Cong
    2017 IEEE/ACM 25TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2017,