A Fine-Grained Distribution Approach for ETL Processes in Big Data Environments

被引:15
|
作者
Bala, Mahfoud [1 ]
Boussaid, Omar [2 ]
Alimazighi, Zaia [3 ]
机构
[1] Saad Dahleb Univ, Dept Informat, Blida 1, Blida, Algeria
[2] Univ Lyon 2, Lyon, France
[3] USTHB, Dept Informat, Algiers, Algeria
关键词
Data Warehousing; ETL; Parallel and Distributed Processing; Big Data; MapReduce; MAPREDUCE; MODEL;
D O I
10.1016/j.datak.2017.08.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Among the so-called "4Vs" (volume, velocity, variety, and veracity) that characterize the complexity of Big Data, this paper focuses on the issue of "Volume" in order to ensure good performance for Extracting-Transforming-Loading (ETL) processes. In this study, we propose a new fine-grained parallelization/distribution approach for populating the Data Warehouse (DW). Unlike prior approaches that distribute the ETL only at coarse-grained level of processing, our approach provides different ways of parallelization/distribution both at process, functionality and elementary functions levels. In our approach, an ETL process is described in terms of its core functionalities which can run on a cluster of computers according to the Map Reduce (MR) paradigm. The novel approach allows thereby the distribution of the ETL process at three levels: the "process" level for coarse-grained distribution and the "functionality" and "elementary functions" levels for fine-grained distribution. Our performance analysis reveals that employing 25 to 38 parallel tasks enables the novel approach to speed up the ETL process by up to 33% with the improvement rate being linear.
引用
收藏
页码:114 / 136
页数:23
相关论文
共 50 条
  • [1] Fine-Grained Provenance for Matching & ETL
    Zheng, Nan
    Alawini, Abdussalam
    Ives, Zachary G.
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 184 - 195
  • [2] Forecasting Fine-Grained Air Quality Based on Big Data
    Zheng, Yu
    Yi, Xiuwen
    Li, Ming
    Li, Ruiyuan
    Shan, Zhangqing
    Chang, Eric
    Li, Tianrui
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 2267 - 2276
  • [3] Towards Fine-Grained Dataflow Parallelism in Big Data Systems
    Ertel, Sebastian
    Adam, Justus
    Castrillon, Jeronimo
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2017, 2019, 11403 : 281 - 282
  • [4] Fine-Grained Data Distribution Operations for Particle Codes
    Hofmann, Michael
    Ruenger, Gudula
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, PROCEEDINGS, 2009, 5759 : 54 - 63
  • [5] Fine-grained workflow in heterogeneous environments
    Curran, Oisin
    Downes, Paddy
    Cunniffe, John
    Shearer, Andy
    PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008, : 115 - +
  • [6] Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft
    Jindal, Alekh
    Qiao, Shi
    Sen, Rathijit
    Patel, Hiren
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2423 - 2434
  • [7] Fine-Grained Dynamic Resource Allocation for Big-Data Applications
    Baresi, Luciano
    Leva, Alberto
    Quattrocchi, Giovanni
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (08) : 1668 - 1682
  • [8] Construct Fine-grained Energy Big Data Using NILM Technology
    Liu, Yan
    Yuan, Ruiming
    Yang, Xiaokun
    Liu, Bo
    Zhang, Ruiqi
    2021 3RD ASIA ENERGY AND ELECTRICAL ENGINEERING SYMPOSIUM (AEEES 2021), 2021, : 1160 - 1164
  • [9] Fine-Grained Knowledge Sharing in Collaborative Environments
    Guan, Ziyu
    Yang, Shengqi
    Sun, Huan
    Srivatsa, Mudhakar
    Yan, Xifeng
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (08) : 2163 - 2174
  • [10] A FINE-GRAINED ANALYSIS ON DISTRIBUTION SHIFT
    Wiles, Olivia
    Gowal, Sven
    Stimberg, Florian
    Rebuffi, Sylvestre-Alvise
    Ktena, Ira
    Dvijotham, Krishnamurthy
    Cemgil, Taylan
    ICLR 2022 - 10th International Conference on Learning Representations, 2022,