Data Placement Strategies that Speed-Up Distributed Graph Query Processing

被引:2
|
作者
Janke, Daniel [1 ]
Staab, Steffen [2 ,3 ]
Leinberger, Martin [1 ]
机构
[1] Univ Koblenz Landau, Inst Web Sci & Technol, Mainz, Germany
[2] Univ Stuttgart, Inst Parallel & Distributed Syst, Stuttgart, Germany
[3] Univ Southampton, Web & Internet Sci Res Grp, Southampton, Hants, England
关键词
distributed RDF store; graph cover; graph partitioning;
D O I
10.1145/3391274.3393633
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem how to optimize the data distribution to improve the query performance in distributed RDF stores running on compute node clusters. When hash-based data distribution strategies are used, the query workload tends to be equally balanced among all compute nodes whereas graph-clustering-based approaches reduce the number of transferred intermediate results. Our hypothesis is that data distribution strategies that collocate entities in small sets of closely connected data items may be able to combine the advantages of both strategies. To investigate this hypothesis, we analyze two such data distribution strategies: 1. Overpartitioned minimal edge-cut cover. 2. Our novel molecule hash cover. Our analysis substantiates our hypothesis by explaining the causes for their good performance. Both strategies reduce query execution time on our set of test queries (between 5% and 98%). While overpartitioned minimal edge-cut cover fares best, when it can be computed, it may lack scalability for large datasets. Our novel molecule hash cover combines scalability and major improvements of query execution time against various baseline strategies.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Synthetic Data Generation to Speed-Up the Object Recognition Pipeline
    Perri, Damiano
    Simonetti, Marco
    Gervasi, Osvaldo
    ELECTRONICS, 2022, 11 (01)
  • [32] Cache investment: Integrating query optimization and distributed data placement
    Kossmann, D
    Franklin, MJ
    Drasch, G
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2000, 25 (04): : 517 - 558
  • [33] Efficient Graph Query Processing over Geo-Distributed Datacenters
    Yuan, Ye
    Ma, Delong
    Wen, Zhenyu
    Ma, Yuliang
    Wang, Guoren
    Chen, Lei
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 619 - 628
  • [34] Efficient OLAP query processing in distributed data warehouses
    Akinde, M
    Böhlen, M
    Johnson, T
    Lakshmanan, LVS
    Srivastava, D
    18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 262 - 262
  • [35] Distributed subgraph query for RDF graph data based on MapReduce
    Su, Qianxiang
    Huang, Qingrong
    Wu, Nan
    Pan, Ying
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 102
  • [36] Data modeling and query processing for distributed surveillance systems
    Nam, Yunyoung
    Hong, Sangjin
    Rho, Seungmin
    NEW REVIEW OF HYPERMEDIA AND MULTIMEDIA, 2013, 19 (3-4) : 299 - 327
  • [37] Efficient OLAP query processing in distributed data warehouses
    Akinde, MO
    Böhlen, MH
    Johnson, T
    Lakshmanan, LVS
    Srivastava, D
    ADVANCES IN DATABASE TECHNOLOGY - EDBT 2002, 2002, 2287 : 336 - 353
  • [38] QUERY-PROCESSING IN DISTRIBUTED DATABASES WITH NONDISJOINT DATA
    GOYAL, P
    NARAYANAN, TS
    SADRI, F
    INFORMATION SYSTEMS, 1993, 18 (07) : 419 - 427
  • [39] Efficient OLAP query processing in distributed data warehouses
    Akinde, MO
    Böhlen, MH
    Johnson, T
    Lakshmanan, LVS
    Srivastava, D
    INFORMATION SYSTEMS, 2003, 28 (1-2) : 111 - 135
  • [40] Distributed Join Query Processing for Big RDF Data
    Elzein, Nahla Mohammed
    Majid, Mazlina Abdul
    Fakherldin, Mohammed
    Hashem, Ibrahim Abaker Targio
    ADVANCED SCIENCE LETTERS, 2018, 24 (10) : 7758 - 7761