Big Data Pre-Processing: A Quality Framework

被引:65
|
作者
Taleb, Ikbal [1 ]
Dssouli, Rachida [1 ]
Serhani, Mohamed Adel [2 ]
机构
[1] Concordia Univ, CIISE, Montreal, PQ, Canada
[2] UAE Univ, Coll Informat Technol, Al Ain, U Arab Emirates
关键词
Big Data; Data Quality; pre-processing; DATA PROVENANCE; CHALLENGES; MANAGEMENT; ANALYTICS;
D O I
10.1109/BigDataCongress.2015.35
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the abundance of raw data generated from various sources, Big Data has become a preeminent approach in acquiring, processing, and analyzing large amounts of heterogeneous data to derive valuable evidences. The size, speed, and formats in which data is generated and processed affect the overall quality of information. Therefore, Quality of Big Data (QBD) has become an important factor to ensure that the quality of data is maintained at all Big data processing phases. This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization. We propose a QBD model incorporating processes to support Data quality profile selection and adaptation. In addition, it tracks and registers on a data provenance repository the effect of every data transformation happened in the pre-processing phase. We evaluate the data quality selection module using large EEG dataset. The obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.
引用
收藏
页码:191 / 198
页数:8
相关论文
共 50 条
  • [21] A Study on Pre-processing and Fault Analysis using PMU Big Data of Substation
    Lee K.-M.
    Park C.-W.
    Transactions of the Korean Institute of Electrical Engineers, 2023, 72 (09): : 975 - 981
  • [22] Neural Pre-processing: A Learning Framework for End-to-End Brain MRI Pre-processing
    He, Xinzi
    Wang, Alan Q.
    Sabuncu, Mert R.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VIII, 2023, 14227 : 258 - 267
  • [23] A framework to simplify pre-processing location-based social media big data for sustainable urban planning and management
    Abdul-Rahman, Mohammed
    Chan, Edwin H. W.
    Wong, Man Sing
    Irekponor, Victor E.
    Abdul-Rahman, Maryam O.
    CITIES, 2021, 109
  • [24] A Smart Data Pre-Processing Approach to Effective Management of Big Health Data in IoT Edge
    Kaya, Sukru Mustafa
    Erdem, Atakan
    Gunes, Ali
    SMART HOMECARE TECHNOLOGY AND TELEHEALTH, 2021, 8 : 9 - 21
  • [25] On Pre-processing Algorithms for Data Stream
    Duda, Piotr
    Jaworski, Maciej
    Pietruczuk, Lena
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 56 - 63
  • [26] THREE PRE-PROCESSING STEPS TO INCREASE THE QUALITY OF KINECT RANGE DATA
    Davoodianidaliki, M.
    Saadatseresht, M.
    SMPR CONFERENCE 2013, 2013, 40-1-W3 : 127 - 132
  • [27] Kurtosis removal for data pre-processing
    Loperfido, Nicola
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2023, 17 (01) : 239 - 267
  • [28] Kurtosis removal for data pre-processing
    Nicola Loperfido
    Advances in Data Analysis and Classification, 2023, 17 : 239 - 267
  • [29] Intelligent assistance for data pre-processing
    Bilalli, Besim
    Abello, Alberto
    Aluja-Banet, Tomas
    Wrembel, Robert
    COMPUTER STANDARDS & INTERFACES, 2018, 57 : 101 - 109
  • [30] A NEW METHOD FOR DATA PRE-PROCESSING
    RAISINGHANI, SC
    BILIMORIA, KD
    JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 1984, 7 (02) : 255 - 256