A Stream Partitioning Approach to Processing Large Scale Distributed Graph Datasets

被引:0
|
作者
Wang, Rui [1 ]
Chiu, Kenneth [1 ]
机构
[1] SUNY Binghamton, Dept Comp Sci, Binghamton, NY 13901 USA
关键词
communication cost; dataset partitioning; online algorithm; graph partitioning; large scale; RDF dataset;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RDF datasets are an important source of big data. Many of them, however, are too large to fit on a single machine. One approach to address this is to partition the RDF graph across multiple machines, with each component residing on a single machine. A poor partition can incur significant communication costs, however, if as a result many queries involve multiple machines. A number of existing partitioning schemes seek to reduce these costs by finding partitions that avoid cutting edges in the RDF graph. While these can successfully find good partitions the partitioning process itself is often not very scalable, and not capable of handling incrementally-generated RDF data. In this paper, we develop a more scalable, effective and low complexity approach, online graph dataset partitioning, to produce high quality dataset partitions with fewer links between partitions. We show experimentally that it works well in reducing the communication cost of query processing, while at the same time improving scalability of the partitioning itself.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] A Distributed Graph Partitioning Algorithm for Processing Large Graphs
    Chen, Tefeng
    Li, Bo
    PROCEEDINGS 2016 IEEE SYMPOSIUM ON SERVICE-ORIENTED SYSTEM ENGINEERING SOSE 2016, 2016, : 71 - 77
  • [2] A Distributed Algorithm for Large-Scale Graph Partitioning
    Rahimian, Fatemeh
    Payberah, Amir H.
    Girdzijauskas, Sarunas
    Jelasity, Mark
    Haridi, Seif
    ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2015, 10 (02)
  • [3] An efficient approach for large scale graph partitioning
    Loureiro, Renzo Z.
    Amaral, Andre R. S.
    JOURNAL OF COMBINATORIAL OPTIMIZATION, 2007, 13 (04) : 289 - 320
  • [4] An efficient approach for large scale graph partitioning
    Renzo Zamprogno
    André R. S. Amaral
    Journal of Combinatorial Optimization, 2007, 13
  • [5] Graph Partitioning for Distributed Graph Processing
    Onizuka M.
    Fujimori T.
    Shiokawa H.
    Data Science and Engineering, 2017, 2 (1) : 94 - 105
  • [6] Large Scale Graph Processing in a Distributed Environment
    Upadhyay, Nitesh
    Patel, Parita
    Cheramangalath, Unnikrishnan
    Srikant, Y. N.
    EURO-PAR 2017: PARALLEL PROCESSING WORKSHOPS, 2018, 10659 : 465 - 477
  • [7] Smart Distributed DataSets for Stream Processing
    Lopes, Tiago
    Coimbra, Miguel
    Veiga, Luis
    EURO-PAR 2021: PARALLEL PROCESSING, 2021, 12820 : 249 - 265
  • [8] DHPV: a distributed algorithm for large-scale graph partitioning
    Wilfried Yves Hamilton Adoni
    Tarik Nahhal
    Moez Krichen
    Abdeltif El byed
    Ismail Assayad
    Journal of Big Data, 7
  • [9] DHPV: a distributed algorithm for large-scale graph partitioning
    Adoni, Wilfried Yves Hamilton
    Nahhal, Tarik
    Krichen, Moez
    El byed, Abdeltif
    Assayad, Ismail
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [10] Distributed large-scale graph processing on FPGAs
    Sahebi, Amin
    Barbone, Marco
    Procaccini, Marco
    Luk, Wayne
    Gaydadjiev, Georgi
    Giorgi, Roberto
    JOURNAL OF BIG DATA, 2023, 10 (01)