UnifyDR: A Generic Framework for Unifying Data and Replica Placement

被引:4
|
作者
Atrey, Ankita [1 ]
Van Seghbroeck, Gregory [1 ]
Mora, Higinio [2 ]
Volckaert, Bruno [1 ]
De Turck, Filip [1 ]
机构
[1] Univ Ghent, Internet Technol & Data Sci Lab IDLAB, IMEC, B-9052 Ghent, Belgium
[2] Univ Alicante, Dept Comp Sci Technol & Computat, Alicante 03690, Spain
关键词
Distributed databases; Optimization; Social networking (online); Clustering algorithms; Correlation; Cloud computing; Scalability; Data placement; replica placement; OLAP; online social networks; join-intensive queries; location-based services; scalability; overlapping clustering; DATA-INTENSIVE APPLICATIONS; ALGORITHM; STRATEGY; NETWORK;
D O I
10.1109/ACCESS.2020.3041670
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The advent of (big) data management applications operating at Cloud scale has led to extensive research on the data placement problem. The key objective of data placement is to obtain a partitioning (possibly allowing for replicas) of a set of data-items into distributed nodes that minimizes the overall network communication cost. Although replication is intrinsic to data placement, it has seldom been studied in combination with the latter. On the contrary, most of the existing solutions treat them as two independent problems, and employ a two-phase approach: (1) data placement, followed by (2) replica placement. We address this by proposing a new paradigm, CDR, with the objective of c ombining d ata and r eplica placement as a single joint optimization problem. Specifically, we study two variants of the CDR problem: (1) CDR-Single, where the objective is to minimize the communication cost alone, and (2) CDR-Multi, which performs a multi-objective optimization to also minimize traffic and storage costs. To unify data and replica placement, we propose a generic framework called UnifyDR, which leverages overlapping correlation clustering to assign a data-item to multiple nodes, thereby facilitating data and replica placement to be performed jointly. We establish the generic nature of UnifyDR by portraying its ability to address the CDR problem in two real-world use-cases, that of join-intensive online analytical processing (OLAP) queries and a location-based online social network (OSN) service. The effectiveness and scalability of UnifyDR are showcased by experiments performed on data generated using the TPC-DS benchmark and a trace of the Gowalla OSN for the OLAP queries and OSN service use-case, respectively. Empirically, the presented approach obtains an improvement of approximately 35% in terms of the evaluated metrics and a speed-up of 8 times in comparison to state-of-the-art techniques.
引用
收藏
页码:216894 / 216910
页数:17
相关论文
共 50 条
  • [1] Unifying Data and Replica Placement for Data-intensive Services in Geographically Distributed Clouds
    Atrey, Ankita
    Van Seghbroeck, Gregory
    Mora, Higinio
    De Turck, Filip
    Volckaert, Bruno
    CLOSER: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2019, : 25 - 36
  • [2] Replica Placement Strategies in Data Grid
    Rahman, Rashedur M.
    Barker, Ken
    Alhajj, Reda
    JOURNAL OF GRID COMPUTING, 2008, 6 (01) : 103 - 123
  • [3] Replica Placement Strategies in Data Grid
    Rashedur M. Rahman
    Ken Barker
    Reda Alhajj
    Journal of Grid Computing, 2008, 6 : 103 - 123
  • [4] Framework for replica placement over cooperative edge networks
    Pingting Hao
    Liang Hu
    Jingyan Jiang
    Xilong Che
    Tong Li
    Kuo Zhao
    Journal of Ambient Intelligence and Humanized Computing, 2019, 10 : 3011 - 3021
  • [5] Framework for replica placement over cooperative edge networks
    Hao, Pingting
    Hu, Liang
    Jiang, Jingyan
    Che, Xilong
    Li, Tong
    Zhao, Kuo
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 10 (08) : 3011 - 3021
  • [6] Replica Placement Strategy for Data Grid Environment
    Madi, Mohammed K.
    Yusof, Yuhanis
    Hassan, Suhaidi
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2013, 5 (01) : 70 - 81
  • [7] Distributed replica placement algorithms for correlated data
    Tu, Manghui
    Yen, I-Ling
    JOURNAL OF SUPERCOMPUTING, 2014, 68 (01): : 245 - 273
  • [8] Data Replica Placement in Cloud Storage System
    Zhang Tao
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON CLOUD COMPUTING AND INFORMATION SECURITY (CCIS 2013), 2013, 52 : 551 - 554
  • [9] Distributed replica placement algorithms for correlated data
    Manghui Tu
    I-Ling Yen
    The Journal of Supercomputing, 2014, 68 : 245 - 273
  • [10] Efficient data replica placement for sensor clouds
    Tao, Yaling
    Zhang, Yongbing
    Ji, Yusheng
    IET COMMUNICATIONS, 2016, 10 (16) : 2162 - 2169