Data Placement Strategies that Speed-Up Distributed Graph Query Processing

被引:2
|
作者
Janke, Daniel [1 ]
Staab, Steffen [2 ,3 ]
Leinberger, Martin [1 ]
机构
[1] Univ Koblenz Landau, Inst Web Sci & Technol, Mainz, Germany
[2] Univ Stuttgart, Inst Parallel & Distributed Syst, Stuttgart, Germany
[3] Univ Southampton, Web & Internet Sci Res Grp, Southampton, Hants, England
关键词
distributed RDF store; graph cover; graph partitioning;
D O I
10.1145/3391274.3393633
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem how to optimize the data distribution to improve the query performance in distributed RDF stores running on compute node clusters. When hash-based data distribution strategies are used, the query workload tends to be equally balanced among all compute nodes whereas graph-clustering-based approaches reduce the number of transferred intermediate results. Our hypothesis is that data distribution strategies that collocate entities in small sets of closely connected data items may be able to combine the advantages of both strategies. To investigate this hypothesis, we analyze two such data distribution strategies: 1. Overpartitioned minimal edge-cut cover. 2. Our novel molecule hash cover. Our analysis substantiates our hypothesis by explaining the causes for their good performance. Both strategies reduce query execution time on our set of test queries (between 5% and 98%). While overpartitioned minimal edge-cut cover fares best, when it can be computed, it may lack scalability for large datasets. Our novel molecule hash cover combines scalability and major improvements of query execution time against various baseline strategies.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Distributed data summaries for approximate query processing in PDMS
    Hose, Katja
    Klan, Daniel
    Sattler, Kai-Uwe
    10TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2006, : 37 - 44
  • [42] An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing
    Verma, Shiv
    Leslie, Luke M.
    Shin, Yosub
    Gupta, Indranil
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (05): : 493 - 504
  • [43] Efficiency speed-up strategies for evolutionary computation: fundamentals and fast-GAs
    Xu, ZB
    Leung, KS
    Liang, Y
    Leung, Y
    APPLIED MATHEMATICS AND COMPUTATION, 2003, 142 (2-3) : 341 - 388
  • [44] A Data Structure to Speed-Up Machine Learning Algorithms on Massive Datasets
    Padillo, Francisco
    Luna, J. M.
    Cano, Alberto
    Ventura, Sebastian
    Hybrid Artificial Intelligent Systems, 2016, 9648 : 365 - 376
  • [45] Mind change speed-up for learning languages from positive data
    Jain, Sanjay
    Kinber, Efim
    THEORETICAL COMPUTER SCIENCE, 2013, 489 : 37 - 47
  • [46] Possibility of decryption speed-up by parallel processing in CCA secure hashed ElGamal
    Kim, Gyu Chol
    Ji, Hyon A.
    Jong, Yong Bok
    Kim, Gwang Hyok
    Kim, Hak Su
    PLOS ONE, 2023, 18 (11):
  • [47] Secrecy and performance models for query processing on outsourced graph data
    Suntaxi, Gabriela
    El Ghazi, Aboubakr Achraf
    Boehm, Klemens
    DISTRIBUTED AND PARALLEL DATABASES, 2021, 39 (01) : 35 - 77
  • [48] Mind Change Speed-up for Learning Languages from Positive Data
    Jain, Sanjay
    Kinber, Efim
    29TH INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE, (STACS 2012), 2012, 14 : 350 - 361
  • [49] Speed-up for the expectation-maximization algorithm for clustering categorical data
    F. -X. Jollois
    M. Nadif
    Journal of Global Optimization, 2007, 37 : 513 - 525
  • [50] Speed-up for the expectation-maximization algorithm for clustering categorical data
    Jollois, F. -X.
    Nadif, M.
    JOURNAL OF GLOBAL OPTIMIZATION, 2007, 37 (04) : 513 - 525