Scalable Data Placement of Data-intensive Services in Geo-distributed Clouds

被引:4
|
作者
Atrey, Ankita [1 ]
Van Seghbroeck, Gregory [1 ]
Volckaert, Bruno [1 ]
De Turck, Filip [1 ]
机构
[1] UGent, IDLAB Imec, Technol Pk, Ghent, Belgium
关键词
Data Placement; Geo-distributed Clouds; Location-based Services; Online Social Networks; Scalability; Spectral Clustering; Hypergraphs; Approximation;
D O I
10.5220/0006767504970508
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The advent of big data analytics and cloud computing technologies has resulted in wide-spread research in finding solutions to the data placement problem, which aims at properly placing the data items into distributed datacenters. Although traditional schemes of uniformly partitioning the data into distributed nodes is the defacto standard for many popular distributed data stores like HDFS or Cassandra, these methods may cause network congestion for data-intensive services, thereby affecting the system throughput. This is because as opposed to MapReduce style workloads, data-intensive services require access to multiple datasets within each transaction. In this paper, we propose a scalable method for performing data placement of data-intensive services into geographically distributed clouds. The proposed algorithm partitions a set of data-items into geo-distributed clouds using spectral clustering on hypergraphs. Additionally, our spectral clustering algorithm leverages randomized techniques for obtaining low-rank approximations of the hypergraph matrix, thereby facilitating superior scalability for computation of the spectra of the hypergraph laplacian. Experiments on a real-world trace-based online social network dataset show that the proposed algorithm is effective, efficient, and scalable. Empirically, it is comparable or even better (in certain scenarios) in efficacy on the evaluated metrics, while being up to 10 times faster in running time when compared to state-of-the-art techniques.
引用
收藏
页码:497 / 508
页数:12
相关论文
共 50 条
  • [41] Compliant Geo-distributed Data Processing in Action
    Beedkar, Kaustubh
    Brekardin, David
    Quiane-Ruiz, Jorge-Anulfo
    Markl, Volker
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12): : 2843 - 2846
  • [42] Low Latency Geo-distributed Data Analytics
    Pu, Qifan
    Ananthanarayanan, Ganesh
    Bodik, Peter
    Kandula, Srikanth
    Akella, Aditya
    Bahl, Paramvir
    Stoica, Ion
    SIGCOMM'15: PROCEEDINGS OF THE 2015 ACM CONFERENCE ON SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2015, : 421 - 434
  • [43] An Instance Reservation Framework for Cost Effective Services in Geo-Distributed Data Centers
    Liu, Kaiyang
    Peng, Jun
    Yu, Boyang
    Liu, Weirong
    Huang, Zhiwu
    Pan, Jianping
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2021, 14 (02) : 356 - 370
  • [44] A Data Placement Strategy for Data-Intensive Cloud Storage
    Ding, Jie
    Han, Haiyun
    Zhou, Aihua
    PROGRESS IN POWER AND ELECTRICAL ENGINEERING, PTS 1 AND 2, 2012, 354-355 : 896 - 900
  • [45] A data placement strategy for data-intensive applications in cloud
    Zheng P.
    Cui L.-Z.
    Wang H.-Y.
    Xu M.
    Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (08): : 1472 - 1480
  • [46] An Optimal Task Placement Strategy in Geo-Distributed Data Centers Involving Renewable Energy
    Wang, Ran
    Lu, Yiwen
    Zhu, Kun
    Hao, Jie
    Wang, Ping
    Cao, Yue
    IEEE ACCESS, 2018, 6 : 61948 - 61958
  • [47] Optimal Task Placement with QoS Constraints in Geo-Distributed Data Centers Using DVFS
    Gu, Lin
    Zeng, Deze
    Barnawi, Ahmed
    Guo, Song
    Stojmenovic, Ivan
    IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (07) : 2049 - 2059
  • [48] TripS: Automated Multi-tiered Data Placement in a Geo-distributed Cloud Environment
    Oh, Kwangsung
    Chandra, Abhishek
    Weissman, Jon
    SYSTOR'17: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE, 2017,
  • [49] Power and Cost-aware Virtual Machine Placement in Geo-distributed Data Centers
    Rawas, Soha
    Zekri, Ahmed
    El Zaart, Ali
    CLOSER: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2018, : 112 - 123
  • [50] Sketch-based Data Placement among Geo-distributed Datacenters for Cloud Storages
    Yu, Boyang
    Pan, Jianping
    IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,