Big Data Pre-Processing: A Quality Framework

被引:65
|
作者
Taleb, Ikbal [1 ]
Dssouli, Rachida [1 ]
Serhani, Mohamed Adel [2 ]
机构
[1] Concordia Univ, CIISE, Montreal, PQ, Canada
[2] UAE Univ, Coll Informat Technol, Al Ain, U Arab Emirates
关键词
Big Data; Data Quality; pre-processing; DATA PROVENANCE; CHALLENGES; MANAGEMENT; ANALYTICS;
D O I
10.1109/BigDataCongress.2015.35
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the abundance of raw data generated from various sources, Big Data has become a preeminent approach in acquiring, processing, and analyzing large amounts of heterogeneous data to derive valuable evidences. The size, speed, and formats in which data is generated and processed affect the overall quality of information. Therefore, Quality of Big Data (QBD) has become an important factor to ensure that the quality of data is maintained at all Big data processing phases. This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization. We propose a QBD model incorporating processes to support Data quality profile selection and adaptation. In addition, it tracks and registers on a data provenance repository the effect of every data transformation happened in the pre-processing phase. We evaluate the data quality selection module using large EEG dataset. The obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.
引用
收藏
页码:191 / 198
页数:8
相关论文
共 50 条
  • [31] Pre-Processing Methods of Data Mining
    Saleem, Asma
    Asif, Khadim Hussain
    Ali, Ahmad
    Awan, Shahid Mahmood
    AlGhamdi, Mohammed A.
    2014 IEEE/ACM 7TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2014, : 451 - 456
  • [32] Pre-processing VDIF Data in FPGA
    Gan, Jiangying
    Xu, Zhijun
    2018 PROGRESS IN ELECTROMAGNETICS RESEARCH SYMPOSIUM (PIERS-TOYAMA), 2018, : 723 - 728
  • [33] PRE-PROCESSING OF DATA FOR CHARACTER RECOGNITION
    ALCORN, TM
    HOGGAR, CW
    MARCONI REVIEW, 1969, 32 (172): : 61 - &
  • [34] Pre-processing Agilent microarray data
    Zahurak, Marianna
    Parmigiani, Giovanni
    Yu, Wayne
    Scharpf, Robert B.
    Berman, David
    Schaeffer, Edward
    Shabbeer, Shabana
    Cope, Leslie
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [35] Pre-processing Framework for Twitter Sentiment Classification
    Dritsas, Elias
    Vonitsanos, Gerasimos
    Livieris, Ioannis E.
    Kanavo, Andreas
    Ilias, Aristidis
    Makris, Christos
    Tsakalidis, Athanasios
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS (AIAI 2019), 2019, 560 : 138 - 149
  • [36] Pre-processing Agilent microarray data
    Marianna Zahurak
    Giovanni Parmigiani
    Wayne Yu
    Robert B Scharpf
    David Berman
    Edward Schaeffer
    Shabana Shabbeer
    Leslie Cope
    BMC Bioinformatics, 8
  • [37] PRESISTANT: Data Pre-processing Assistant
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Munir, Rana Faisal
    Wrembel, Robert
    INFORMATION SYSTEMS IN THE BIG DATA ERA, 2018, 317 : 57 - 65
  • [38] A Pre-processing framework for spectral classification of hyperspectral images
    Simranjit Singh
    Singara Singh Kasana
    Multimedia Tools and Applications, 2021, 80 : 243 - 261
  • [39] Online calibration and pre-processing of TAMA data
    Tatsumi, D
    Tsunesada, Y
    CLASSICAL AND QUANTUM GRAVITY, 2004, 21 (05) : S451 - S456
  • [40] Data pre-processing pipeline generation for AutoETL
    Giovanelli, Joseph
    Bilalli, Besim
    Abello, Alberto
    INFORMATION SYSTEMS, 2022, 108