A Stream Partitioning Approach to Processing Large Scale Distributed Graph Datasets

被引:0
|
作者
Wang, Rui [1 ]
Chiu, Kenneth [1 ]
机构
[1] SUNY Binghamton, Dept Comp Sci, Binghamton, NY 13901 USA
关键词
communication cost; dataset partitioning; online algorithm; graph partitioning; large scale; RDF dataset;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RDF datasets are an important source of big data. Many of them, however, are too large to fit on a single machine. One approach to address this is to partition the RDF graph across multiple machines, with each component residing on a single machine. A poor partition can incur significant communication costs, however, if as a result many queries involve multiple machines. A number of existing partitioning schemes seek to reduce these costs by finding partitions that avoid cutting edges in the RDF graph. While these can successfully find good partitions the partitioning process itself is often not very scalable, and not capable of handling incrementally-generated RDF data. In this paper, we develop a more scalable, effective and low complexity approach, online graph dataset partitioning, to produce high quality dataset partitions with fewer links between partitions. We show experimentally that it works well in reducing the communication cost of query processing, while at the same time improving scalability of the partitioning itself.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Distributed mining of convoys in large scale datasets
    Faisal Orakzai
    Torben Bach Pedersen
    Toon Calders
    GeoInformatica, 2021, 25 : 353 - 396
  • [32] A Scalable Approach for Distributed Reasoning over Large-scale OWL Datasets
    Mohamed, Heba
    Fathalla, Said
    Lehmann, Jens
    Jabeen, Hajira
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2021, : 51 - 60
  • [33] Graph Partitioning in Parallelization of Large Scale Networks
    Das, Sima
    Leopold, Jennifer
    Ghosh, Susmita
    Das, Sajal K.
    2016 IEEE 41ST CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN), 2016, : 176 - 179
  • [34] COLA: Optimizing Stream Processing Applications via Graph Partitioning
    Khandekar, Rohit
    Hildrum, Kirsten
    Parekh, Sujay
    Rajan, Deepak
    Wolf, Joel
    Wu, Kun-Lung
    Andrade, Henrique
    Gedik, Bugra
    MIDDLEWARE 2009, PROCEEDINGS, 2009, 5896 : 308 - 327
  • [35] Cost-Aware Partitioning for Efficient Large Graph Processing in Geo-Distributed Datacenters
    Zhou, Amelie Chi
    Shen, Bingkun
    Xiao, Yao
    Ibrahim, Shadi
    He, Bingsheng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (07) : 1707 - 1723
  • [36] An Analysis of Distributed Programming Models and Frameworks for Large-scale Graph Processing
    Corbellini, Alejandro
    Godoy, Daniela
    Mateos, Cristian
    Schiaffino, Silvia
    Zunino, Alejandro
    IETE JOURNAL OF RESEARCH, 2022, 68 (04) : 3065 - 3073
  • [37] Adaptive Partitioning for Large-Scale Graph Analytics in Geo-Distributed Data Centers
    Zhou, Amelie Chi
    Luo, Juanyun
    Qiu, Ruibo
    Tan, Haobin
    He, Bingsheng
    Mao, Rui
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2818 - 2830
  • [38] Hybrid Graph Partitioning with OLB Approach in Distributed Transactions
    Bharati, Rajesh
    Attar, Vahida
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 37 (01): : 763 - 775
  • [39] On The Soundness of a Language for Large and Distributed Graph Processing
    Diop, Alpha Mouhamadou
    Ba, Cheikh
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES, AICT 2024, 2024,
  • [40] Performance and Monetary Cost of Large-scale Distributed Graph Processing on Amazon Cloud
    Li, Zengxiang
    Thai Nguyen Hung
    Lu, Sifei
    Goh, Rick Siow Mong
    2016 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING RESEARCH AND INNOVATION - ICCCRI 2016, 2016, : 9 - 16