Data Placement Strategies that Speed-Up Distributed Graph Query Processing

被引:2
|
作者
Janke, Daniel [1 ]
Staab, Steffen [2 ,3 ]
Leinberger, Martin [1 ]
机构
[1] Univ Koblenz Landau, Inst Web Sci & Technol, Mainz, Germany
[2] Univ Stuttgart, Inst Parallel & Distributed Syst, Stuttgart, Germany
[3] Univ Southampton, Web & Internet Sci Res Grp, Southampton, Hants, England
关键词
distributed RDF store; graph cover; graph partitioning;
D O I
10.1145/3391274.3393633
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem how to optimize the data distribution to improve the query performance in distributed RDF stores running on compute node clusters. When hash-based data distribution strategies are used, the query workload tends to be equally balanced among all compute nodes whereas graph-clustering-based approaches reduce the number of transferred intermediate results. Our hypothesis is that data distribution strategies that collocate entities in small sets of closely connected data items may be able to combine the advantages of both strategies. To investigate this hypothesis, we analyze two such data distribution strategies: 1. Overpartitioned minimal edge-cut cover. 2. Our novel molecule hash cover. Our analysis substantiates our hypothesis by explaining the causes for their good performance. Both strategies reduce query execution time on our set of test queries (between 5% and 98%). While overpartitioned minimal edge-cut cover fares best, when it can be computed, it may lack scalability for large datasets. Our novel molecule hash cover combines scalability and major improvements of query execution time against various baseline strategies.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Distributed Graph Snapshot Placement and Query Performance in a Data Center Environment
    Labouseur, Alan G.
    Svegliato, Justin
    Hwang, Jeong-Hyon
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2015, : 348 - 351
  • [2] Efficient Distributed Query Processing on Large Scale RDF Graph Data
    Wang X.
    Xu Q.
    Chai L.-L.
    Yang Y.-J.
    Chai Y.-P.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (03): : 498 - 514
  • [3] MULTIPROCESSOR SIMULATION STRATEGIES WITH OPTIMAL SPEED-UP
    DUNNE, PE
    GITTINGS, CJ
    LENG, PH
    INFORMATION PROCESSING LETTERS, 1995, 54 (01) : 23 - 33
  • [4] ATTENTIONAL SPEED-UP OF PROCESSING IN MOTION INDUCTION
    VONGRUNAU, MW
    RACETTE, L
    KWAS, M
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 1995, 36 (04) : S372 - S372
  • [5] Impact analysis of data placement strategies on query efforts in distributed RDF stores
    Janke, Daniel
    Staab, Steffen
    Thimm, Matthias
    JOURNAL OF WEB SEMANTICS, 2018, 50 : 21 - 48
  • [6] A Speed-Up Technique for Distributed Shortest Paths Computation
    D'Angelo, Gianlorenzo
    D'Emidio, Mattia
    Frigioni, Daniele
    Maurizio, Vinicio
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2011, PT II, 2011, 6783 : 578 - 593
  • [7] A distributed query processing strategy using placement dependency
    Liu, CW
    Chen, H
    Krueger, W
    PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1996, : 477 - 484
  • [8] Speed-up Image Processing on Mobile CPU and GPU
    Baek, A-Ram
    Lee, Kangwoon
    Choi, Haechul
    2015 Asia Pacific Conference on Multimedia and Broadcasting, 2015, : 79 - 81
  • [9] A Query Engine for Distributed Query Processing on Linked Data
    Magalhaes, Regis Pires
    Monteiro, Jose Maria
    Vidal, Vania M. P.
    de Macedo, Jose A. F.
    Maia, Macedo
    Porto, Fabio
    Casanova, Marco A.
    ICEIS: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1, 2013, : 185 - 192
  • [10] Parameterized complexity: Exponential speed-up for planar graph problems
    Alber, J
    Fernau, H
    Niedermeier, R
    AUTOMATA LANGUAGES AND PROGRAMMING, PROCEEDING, 2001, 2076 : 261 - 272