Efficient incremental loading in ETL processing for real-time data integration

被引:0
|
作者
Neepa Biswas
Anamitra Sarkar
Kartick Chandra Mondal
机构
[1] Jadavpur University,Department of Information Technology
关键词
Data warehouse; Code-based ETL; ETL tools; Pygrametl; Petl; Scriptella; Incremental load; Bulk load; CDC;
D O I
暂无
中图分类号
学科分类号
摘要
ETL (extract transform load) is the widely used standard process for creating and maintaining a data warehouse (DW). ETL is the most resource-, cost- and time-demanding process in DW implementation and maintenance. Nowadays, many graphical user interfaces (GUI)-based solutions are available to facilitate the ETL processes. In spite of the high popularity of GUI-based tool, there is still some downside of such approach. This paper focuses on alternative ETL developmental approach taken by hand coding. In some contexts like research and academic work, it is appropriate to go for custom-coded solution which can be cheaper, faster and maintainable compared to any GUI-based tools. Some well-known code-based open-source ETL tools developed by the academic world have been studied in this article. Their architecture and implementation details are addressed here. The aim of this paper is to present a comparative evaluation of these code-based ETL tools. Finally, an efficient ETL model is designed to meet the near real-time responsibility of the present days.
引用
收藏
页码:53 / 61
页数:8
相关论文
共 50 条
  • [31] Real-time processing of streaming big data
    Ali A. Safaei
    Real-Time Systems, 2017, 53 : 1 - 44
  • [32] REAL-TIME DATA-PROCESSING SEQUENCE
    NISNEVIC.LB
    AUTOMATION AND REMOTE CONTROL, 1969, (04) : 585 - &
  • [33] PARALLEL PROCESSING AND REAL-TIME DATA ACQUISITION
    TAYLOR, S
    TAYLOR, R
    IEEE TRANSACTIONS ON NUCLEAR SCIENCE, 1990, 37 (02) : 355 - 360
  • [34] Batch to Real-Time: Incremental Data Collection & Analytics Platform
    Aydin, Ahmet Arif
    Anderson, Kenneth M.
    PROCEEDINGS OF THE 50TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 2017, : 5911 - 5920
  • [35] An incremental approach for real-time Big Data visual analytics
    Garcia, Ignacio
    Casado, Ruben
    Bouchachia, Abdelhamid
    2016 IEEE 4TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD WORKSHOPS (FICLOUDW), 2016, : 177 - 182
  • [36] An efficient architecture for processing real-time traffic data streams using apache flink
    Deepthi, B. Gnana
    Rani, K. Sandhya
    Krishna, P. Venkata
    Saritha, V.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (13) : 37369 - 37385
  • [37] An efficient architecture for processing real-time traffic data streams using apache flink
    B. Gnana Deepthi
    K. Sandhya Rani
    P. Venkata Krishna
    V. Saritha
    Multimedia Tools and Applications, 2024, 83 : 37369 - 37385
  • [38] Efficient Data Streams Processing in the Real Time Data Warehouse
    Majeed, Fiaz
    Mahmood, Muhammad Sohaib
    Iqbal, Mujahid
    PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (ICCSIT 2010), VOL 5, 2010, : 57 - 61
  • [39] Real-Time Integration of Geo-data in ORM
    Balsters, Herman
    Klaver, Chris
    Huitema, George B.
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2010 WORKSHOPS, 2010, 6428 : 436 - 446
  • [40] Near Real-Time Data Warehousing Using State-of-the-Art ETL Tools
    Joerg, Thomas
    Dessloch, Stefan
    ENABLING REAL-TIME BUSINESS INTELLIGENCE, 2010, 41 : 100 - 117