Data Placement Strategies that Speed-Up Distributed Graph Query Processing

被引:2
|
作者
Janke, Daniel [1 ]
Staab, Steffen [2 ,3 ]
Leinberger, Martin [1 ]
机构
[1] Univ Koblenz Landau, Inst Web Sci & Technol, Mainz, Germany
[2] Univ Stuttgart, Inst Parallel & Distributed Syst, Stuttgart, Germany
[3] Univ Southampton, Web & Internet Sci Res Grp, Southampton, Hants, England
关键词
distributed RDF store; graph cover; graph partitioning;
D O I
10.1145/3391274.3393633
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem how to optimize the data distribution to improve the query performance in distributed RDF stores running on compute node clusters. When hash-based data distribution strategies are used, the query workload tends to be equally balanced among all compute nodes whereas graph-clustering-based approaches reduce the number of transferred intermediate results. Our hypothesis is that data distribution strategies that collocate entities in small sets of closely connected data items may be able to combine the advantages of both strategies. To investigate this hypothesis, we analyze two such data distribution strategies: 1. Overpartitioned minimal edge-cut cover. 2. Our novel molecule hash cover. Our analysis substantiates our hypothesis by explaining the causes for their good performance. Both strategies reduce query execution time on our set of test queries (between 5% and 98%). While overpartitioned minimal edge-cut cover fares best, when it can be computed, it may lack scalability for large datasets. Our novel molecule hash cover combines scalability and major improvements of query execution time against various baseline strategies.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Malleable Task-Graph Scheduling with a Practical Speed-Up Model
    Marchal, Loris
    Simon, Bertrand
    Sinnen, Oliver
    Vivien, Frederic
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (06) : 1357 - 1370
  • [22] Computation of graph edit distance: Reasoning about optimality and speed-up
    Serratosa, Francesc
    IMAGE AND VISION COMPUTING, 2015, 40 : 38 - 48
  • [23] Robust Distributed Query Processing for Streaming Data
    Lei, Chuan
    Rundensteiner, Elke A.
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2014, 39 (02):
  • [24] Using bloom filters to speed-up name lookup in distributed systems
    Little, MC
    Shrivastava, SK
    Speirs, NA
    COMPUTER JOURNAL, 2002, 45 (06): : 645 - 652
  • [25] Data placement and query processing based on RPE parallelisms
    Yu, YX
    Wang, GR
    Yu, G
    Wu, G
    Hu, JN
    Tang, N
    27TH ANNUAL INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE, PROCEEDINGS, 2003, : 151 - 156
  • [26] Data Replication for Distributed Graph Processing
    Ho, Li-Yung
    Wu, Jan-Jan
    Liu, Pangfeng
    2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, : 319 - 326
  • [27] On the Parallel Speed-Up of Estimation of Multivariate Normal Algorithm and Evolution Strategies
    Teytaud, Fabien
    Teytaud, Olivier
    APPLICATIONS OF EVOLUTIONARY COMPUTING, PROCEEDINGS, 2009, 5484 : 655 - 664
  • [28] Efficiency speed-up strategies for evolutionary computation: art adaptive implementation
    Leung, KS
    ENGINEERING COMPUTATIONS, 2002, 19 (3-4) : 272 - 304
  • [29] A Graph Based Meta-model for Speed-up Service Composition on Web
    Bhattacharya, Adrija
    Choudhury, Sankhayan
    2014 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2014, : 776 - 781
  • [30] Minibatch Processing for Speed-up and Scalability of Spiking Neural Network Simulation
    Saunders, Daniel J.
    Sigrist, Cooper
    Chaney, Kenneth
    Kozma, Robert
    Siegelmann, Hava T.
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,