A Stream Partitioning Approach to Processing Large Scale Distributed Graph Datasets

被引:0
|
作者
Wang, Rui [1 ]
Chiu, Kenneth [1 ]
机构
[1] SUNY Binghamton, Dept Comp Sci, Binghamton, NY 13901 USA
关键词
communication cost; dataset partitioning; online algorithm; graph partitioning; large scale; RDF dataset;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RDF datasets are an important source of big data. Many of them, however, are too large to fit on a single machine. One approach to address this is to partition the RDF graph across multiple machines, with each component residing on a single machine. A poor partition can incur significant communication costs, however, if as a result many queries involve multiple machines. A number of existing partitioning schemes seek to reduce these costs by finding partitions that avoid cutting edges in the RDF graph. While these can successfully find good partitions the partitioning process itself is often not very scalable, and not capable of handling incrementally-generated RDF data. In this paper, we develop a more scalable, effective and low complexity approach, online graph dataset partitioning, to produce high quality dataset partitions with fewer links between partitions. We show experimentally that it works well in reducing the communication cost of query processing, while at the same time improving scalability of the partitioning itself.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Efficient Distributed Query Processing on Large Scale RDF Graph Data
    Wang X.
    Xu Q.
    Chai L.-L.
    Yang Y.-J.
    Chai Y.-P.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (03): : 498 - 514
  • [22] Workload Scheduling in Distributed Stream Processors using Graph Partitioning
    Fischer, Lorenz
    Bernstein, Abraham
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 124 - 133
  • [23] Partitioning large-scale artificial society on distributed cluster with statistical movement graph
    Li, Zhen
    Chen, Bin
    Ning, Dandan
    Song, Zhichao
    Guo, Gang
    Qiu, Xiaogang
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2017, 87 (18) : 3413 - 3439
  • [24] Distributed Detection of Large-Scale Internet of Things Botnets Based on Graph Partitioning
    Qian, Kexiang
    Yang, Hongyu
    Li, Ruyu
    Chen, Weizhe
    Luo, Xi
    Yin, Lihua
    APPLIED SCIENCES-BASEL, 2024, 14 (04):
  • [25] Distributed processing of very large datasets with DataCutter
    Beynon, MD
    Kurc, T
    Catalyurek, U
    Chang, CL
    Sussman, A
    Saltz, J
    PARALLEL COMPUTING, 2001, 27 (11) : 1457 - 1478
  • [26] Predicting the Stability of Large-scale Distributed Stream Processing Systems on the Cloud
    Tri Minh Truong
    Harwood, Aaron
    Sinnott, Richard O.
    CLOSER: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2017, : 575 - 582
  • [27] Performance Analysis of Large-scale Distributed Stream Processing Systems on the Cloud
    Tri Minh Truong
    Harwood, Aaron
    Sinnott, Richard O.
    Chen, Shiping
    PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2018, : 754 - 761
  • [28] An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing
    Verma, Shiv
    Leslie, Luke M.
    Shin, Yosub
    Gupta, Indranil
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (05): : 493 - 504
  • [29] Storage optimization for large-scale distributed stream-processing systems
    Hildrum, Kirsten
    Douglis, Fred
    Wolf, Joel L.
    Yu, Philip S.
    Fleischer, Lisa
    Katta, Akshay
    ACM Transactions on Storage, 2008, 3 (04)
  • [30] Distributed mining of convoys in large scale datasets
    Orakzai, Faisal
    Pedersen, Torben Bach
    Calders, Toon
    GEOINFORMATICA, 2021, 25 (02) : 353 - 396