LSH-based distributed similarity indexing with load balancing in high-dimensional space

被引:10
|
作者
Wu, Jiagao [1 ,2 ]
Shen, Lu [1 ,2 ]
Liu, Linfeng [1 ,2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp, POB 843, Nanjing 210023, Peoples R China
[2] Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2020年 / 76卷 / 01期
基金
中国国家自然科学基金;
关键词
Locality-sensitive hashing; Similarity search; P2P networks; Load balancing; High-dimensional space; EFFICIENT; SEARCH;
D O I
10.1007/s11227-019-03047-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Locality-sensitive hashing (LSH) and its variants are well-known indexing schemes for solving the similarity search problem in high-dimensional space. Traditionally, these indexing schemes are centrally managed and multiple hash tables are needed to guarantee the search quality. However, due to the limitation of storage space and processing capacity of the server, the centralized indexing schemes become impractical for massive data objects. Therefore, several distributed indexing schemes based on peer-to-peer (P2P) networks are proposed, whereas how to ensure load balancing is still one of the key issues. To solve the problem, in this paper, we propose two theoretical LSH-based data distribution models in P2P networks for datasets with homogeneous and heterogeneous l2\documentclass[12pt]{minimal}earlier schemes, to our knowledge, we focus on load balancing for a single hash table rather than multiple tables, which has not been considered previously. Then, we propose a static distributed indexing scheme with a novel load balancing indexing mapping method based on the cumulative distribution function by our models. Furthermore, we propose a dynamic load rebalancing algorithm using virtual node method of P2P networks to make the static indexing scheme more practical and robust. The experiments based on synthetic and real datasets show that the proposed distributed similarity indexing schemes are effective and efficient for load balancing in similarity indexing of high-dimensional space.
引用
收藏
页码:636 / 665
页数:30
相关论文
共 50 条
  • [41] Adaptive Indexing in High-Dimensional Metric Spaces
    Lampropoulos, Konstantinos
    Zardbani, Fatemeh
    Mamoulis, Nikos
    Karras, Panagiotis
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (10): : 2525 - 2537
  • [42] Subspace indexing for extremely high-dimensional CBIR
    Wichert, Andrzej
    2008 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, 2008, : 314 - 321
  • [43] New high-dimensional indexing structure based on principal component sorting
    School of Computer Science and Engineering, Xidian Univ., Xi'an 710071, China
    Xi Tong Cheng Yu Dian Zi Ji Shu/Syst Eng Electron, 2006, 12 (1927-1931):
  • [44] KSR-Tree: A Clustering based High-dimensional Indexing Approach
    Zhang, Wei
    Wang, Hanhu
    Li, Hui
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY II, PTS 1-4, 2013, 411-414 : 366 - 369
  • [45] High-dimensional indexing method based on elliptical-shaped clustering
    Cui, Jiang-Tao
    Guo, Yong
    Zhou, Shui-Sheng
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2010, 23 (04): : 483 - 490
  • [46] High-Dimensional Indexing Algorithm Based on the Hyperplane Tree-structure
    Liu, Lian
    Xiang, Fenghong
    Mao, Jianlin
    Zhang, Maoxing
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 2730 - 2733
  • [47] LSH-based private data protection for service quality with big range in distributed educational service recommendations
    Yan, Chao
    Chen, Xuening
    Kong, Qinglei
    EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2019, 2019 (1)
  • [48] Progressive high-dimensional similarity join
    Tok, Wee Hyong
    Bressan, Stephane
    Lee, Mong-Li
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 233 - +
  • [49] A High-Dimensional Particle Swarm Optimization Based on Similarity Measurement
    Feng, Jiqiang
    Lai, Guixiang
    Cheng, Shi
    Zhang, Feng
    Sun, Yifei
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2017, PT I, 2017, 10385 : 180 - 188
  • [50] Resampling-Based Similarity Measures for High-Dimensional Data
    Amaratunga, Dhammika
    Cabrera, Javier
    Lee, Yung-Seop
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2015, 22 (01) : 54 - 62