A Stream Partitioning Approach to Processing Large Scale Distributed Graph Datasets

被引:0
|
作者
Wang, Rui [1 ]
Chiu, Kenneth [1 ]
机构
[1] SUNY Binghamton, Dept Comp Sci, Binghamton, NY 13901 USA
关键词
communication cost; dataset partitioning; online algorithm; graph partitioning; large scale; RDF dataset;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RDF datasets are an important source of big data. Many of them, however, are too large to fit on a single machine. One approach to address this is to partition the RDF graph across multiple machines, with each component residing on a single machine. A poor partition can incur significant communication costs, however, if as a result many queries involve multiple machines. A number of existing partitioning schemes seek to reduce these costs by finding partitions that avoid cutting edges in the RDF graph. While these can successfully find good partitions the partitioning process itself is often not very scalable, and not capable of handling incrementally-generated RDF data. In this paper, we develop a more scalable, effective and low complexity approach, online graph dataset partitioning, to produce high quality dataset partitions with fewer links between partitions. We show experimentally that it works well in reducing the communication cost of query processing, while at the same time improving scalability of the partitioning itself.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Factor Graph Approach to Distributed Facility Location in Large-Scale Networks
    Ngo, Hung Q.
    Lee, Sungyoung
    Lee, Young-Koo
    2009 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, VOLS 1- 4, 2009, : 943 - 947
  • [42] Edge Hashing Distributed Sampling Algorithm for Triangle Counting in Large-scale Dynamic Graph Stream
    He, Yulin
    Wu, Bo
    Wu, Dingming
    Huang, Zhexue
    Philippe, Fournier-Viger
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (08): : 1882 - 1903
  • [43] ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks
    Henning, Soeren
    Vogel, Adriano
    Leichtfried, Michael
    Ertl, Otmar
    Rabiser, Rick
    PROCEEDINGS OF THE 15TH ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING, ICPE 2024, 2024, : 2 - 13
  • [44] R*-Grove: Balanced Spatial Partitioning for Large-Scale Datasets
    Vu, Tin
    Eldawy, Ahmed
    FRONTIERS IN BIG DATA, 2020, 3
  • [45] MODELLING LARGE SCALE DATASETS USING PARTITIONING-BASED PCA
    Alakkari, Salaheddin
    Dingliana, John
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 2646 - 2650
  • [46] Extended Kalman Filter for Large Scale Vessels Trajectory Tracking in Distributed Stream Processing Systems
    Juraszek, Katarzyna
    Saini, Nidhi
    Charfuelan, Marcela
    Hemsen, Holmer
    Markl, Volker
    ADVANCED ANALYTICS AND LEARNING ON TEMPORAL DATA, AALTD 2019, 2020, 11986 : 151 - 166
  • [47] Duality-Based Locality-Aware Stream Partitioning in Distributed Stream Processing Engines
    Son, Siwoon
    Moon, Yang-Sae
    EURO-PAR 2019: PARALLEL PROCESSING WORKSHOPS, 2020, 11997 : 725 - 730
  • [48] Distributed processing platform for large datasets: satellite imagery usecase
    Filip, Ion-Dorincl
    Negru, Catalin
    Pop, Florin
    Stoica, Adrian
    Serban, Florin
    2019 IEEE 15TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP 2019), 2019, : 63 - 70
  • [49] A New Graph-Partitioning Algorithm for Large-Scale Knowledge Graph
    Zhong, Jiang
    Wang, Chen
    Li, Qi
    Li, Qing
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2018, 2018, 11323 : 434 - 444
  • [50] Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks
    Presser, Daniel
    Siqueira, Frank
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2023, 51 (4-5) : 231 - 255