LSH-based distributed similarity indexing with load balancing in high-dimensional space

被引:10
|
作者
Wu, Jiagao [1 ,2 ]
Shen, Lu [1 ,2 ]
Liu, Linfeng [1 ,2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp, POB 843, Nanjing 210023, Peoples R China
[2] Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing 210023, Peoples R China
来源
JOURNAL OF SUPERCOMPUTING | 2020年 / 76卷 / 01期
基金
中国国家自然科学基金;
关键词
Locality-sensitive hashing; Similarity search; P2P networks; Load balancing; High-dimensional space; EFFICIENT; SEARCH;
D O I
10.1007/s11227-019-03047-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Locality-sensitive hashing (LSH) and its variants are well-known indexing schemes for solving the similarity search problem in high-dimensional space. Traditionally, these indexing schemes are centrally managed and multiple hash tables are needed to guarantee the search quality. However, due to the limitation of storage space and processing capacity of the server, the centralized indexing schemes become impractical for massive data objects. Therefore, several distributed indexing schemes based on peer-to-peer (P2P) networks are proposed, whereas how to ensure load balancing is still one of the key issues. To solve the problem, in this paper, we propose two theoretical LSH-based data distribution models in P2P networks for datasets with homogeneous and heterogeneous l2\documentclass[12pt]{minimal}earlier schemes, to our knowledge, we focus on load balancing for a single hash table rather than multiple tables, which has not been considered previously. Then, we propose a static distributed indexing scheme with a novel load balancing indexing mapping method based on the cumulative distribution function by our models. Furthermore, we propose a dynamic load rebalancing algorithm using virtual node method of P2P networks to make the static indexing scheme more practical and robust. The experiments based on synthetic and real datasets show that the proposed distributed similarity indexing schemes are effective and efficient for load balancing in similarity indexing of high-dimensional space.
引用
收藏
页码:636 / 665
页数:30
相关论文
共 50 条
  • [1] LSH-based distributed similarity indexing with load balancing in high-dimensional space
    Jiagao Wu
    Lu Shen
    Linfeng Liu
    The Journal of Supercomputing, 2020, 76 : 636 - 665
  • [2] Towards Load Balancing for LSH-based Distributed Similarity Indexing in High-dimensional Space<bold> </bold>
    Shen, Lu
    Wu, Jiagao
    Wang, Yongrong
    Liu, Linfeng
    IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 384 - 391
  • [3] Data Independent Method of Constructing Distributed LSH for Large-Scale Dynamic High-Dimensional Indexing
    Gu, Xiaoguang
    Zhang, Lei
    Zhang, Dongming
    Zhang, Yongdong
    Li, Jintao
    Bao, Ning
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 564 - 571
  • [4] A Generic Method for Accelerating LSH-Based Similarity Join Processing
    Yu, Chenyun
    Nutanong, Sarana
    Li, Hangyu
    Wang, Cong
    Yuan, Xingliang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (04) : 712 - 726
  • [5] Accelerating LSH-based Distributed Search with In-network Computation
    Zhang, Penghao
    Pan, Heng
    Li, Zhenyu
    He, Peng
    Zhang, Zhibin
    Tyson, Gareth
    Xie, Gaogang
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,
  • [6] High-dimensional similarity searches using query driven dynamic quantization and distributed indexing
    Guzun, Gheorghi
    Canahuate, Guadalupe
    DISTRIBUTED AND PARALLEL DATABASES, 2020, 38 (02) : 255 - 286
  • [7] High-dimensional similarity searches using query driven dynamic quantization and distributed indexing
    Gheorghi Guzun
    Guadalupe Canahuate
    Distributed and Parallel Databases, 2020, 38 : 255 - 286
  • [8] NetSHa: In-Network Acceleration of LSH-Based Distributed Search
    Zhang, Penghao
    Pan, Heng
    Li, Zhenyu
    Cui, Penglai
    Jia, Ru
    He, Peng
    Zhang, Zhibin
    Tyson, Gareth
    Xie, Gaogang
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (09) : 2213 - 2229
  • [9] An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors
    Feng Xiaokang
    Cui Jiangtao
    Li Hui
    Liu Yingfan
    Multimedia Tools and Applications, 2019, 78 : 24407 - 24429
  • [10] An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors
    Feng Xiaokang
    Cui Jiangtao
    Li Hui
    Liu Yingfan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24407 - 24429