LSH-based distributed similarity indexing with load balancing in high-dimensional space

被引:10
|
作者
Wu, Jiagao [1 ,2 ]
Shen, Lu [1 ,2 ]
Liu, Linfeng [1 ,2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp, POB 843, Nanjing 210023, Peoples R China
[2] Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2020年 / 76卷 / 01期
基金
中国国家自然科学基金;
关键词
Locality-sensitive hashing; Similarity search; P2P networks; Load balancing; High-dimensional space; EFFICIENT; SEARCH;
D O I
10.1007/s11227-019-03047-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Locality-sensitive hashing (LSH) and its variants are well-known indexing schemes for solving the similarity search problem in high-dimensional space. Traditionally, these indexing schemes are centrally managed and multiple hash tables are needed to guarantee the search quality. However, due to the limitation of storage space and processing capacity of the server, the centralized indexing schemes become impractical for massive data objects. Therefore, several distributed indexing schemes based on peer-to-peer (P2P) networks are proposed, whereas how to ensure load balancing is still one of the key issues. To solve the problem, in this paper, we propose two theoretical LSH-based data distribution models in P2P networks for datasets with homogeneous and heterogeneous l2\documentclass[12pt]{minimal}earlier schemes, to our knowledge, we focus on load balancing for a single hash table rather than multiple tables, which has not been considered previously. Then, we propose a static distributed indexing scheme with a novel load balancing indexing mapping method based on the cumulative distribution function by our models. Furthermore, we propose a dynamic load rebalancing algorithm using virtual node method of P2P networks to make the static indexing scheme more practical and robust. The experiments based on synthetic and real datasets show that the proposed distributed similarity indexing schemes are effective and efficient for load balancing in similarity indexing of high-dimensional space.
引用
收藏
页码:636 / 665
页数:30
相关论文
共 50 条
  • [31] Vector Approximation based Indexing for High-Dimensional Multimedia Databases
    Daoudi, I.
    Ouatik, S. E.
    El Kharraz, A.
    Idrissi, K.
    Aboutajdine, D.
    ENGINEERING LETTERS, 2008, 16 (02)
  • [32] High-dimensional similarity joins
    Shim, K
    Srikant, R
    Agrawal, R
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (01) : 156 - 171
  • [33] High-dimensional similarity joins
    Shim, K
    Srikant, R
    Agrawal, R
    13TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING - PROCEEDINGS, 1997, : 301 - 311
  • [34] Spatial indexing of high-dimensional data based on relative approximation
    Yasushi Sakurai
    Masatoshi Yoshikawa
    Shunsuke Uemura
    Haruhiko Kojima
    The VLDB Journal, 2002, 11 : 93 - 108
  • [35] Spatial indexing of high-dimensional data based on relative approximation
    Sakurai, Y
    Yoshikawa, M
    Uemura, S
    Kojima, H
    VLDB JOURNAL, 2002, 11 (02): : 93 - 108
  • [36] PM-LSH: A Fast and Accurate LSH Framework for High-Dimensional Approximate NN Search
    Zheng, Bolong
    Zhao, Xi
    Weng, Lianggui
    Nguyen Quoc Viet Hung
    Liu, Hang
    Jensen, Christian S.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (05): : 643 - 655
  • [37] LSH-based private data protection for service quality with big range in distributed educational service recommendations
    Chao Yan
    Xuening Chen
    Qinglei Kong
    EURASIP Journal on Wireless Communications and Networking, 2019
  • [38] LOAD: LSH-Based l0-Sampling over Stream Data with Near-Duplicates
    Lurong, Dingzhu
    Wen, Yanlong
    Zhang, Jiangwei
    Yuan, Xiaojie
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT I, 2021, 12457 : 473 - 489
  • [39] SPY-TEC: An efficient indexing method for similarity search in high-dimensional data spaces
    Lee, DH
    Kim, HJ
    DATA & KNOWLEDGE ENGINEERING, 2000, 34 (01) : 77 - 97
  • [40] GMM-ClusterForest: A Novel Indexing Approach for Multi-features Based Similarity Search in High-Dimensional Spaces
    Wan, Yuchai
    Liu, Xiabi
    Tong, Kunqi
    Wei, Xue
    Wu, Yi
    Guan, Fei
    Pang, Kunpeng
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT II, 2012, 7664 : 210 - 217