Big Data Pre-Processing: A Quality Framework

被引:65
|
作者
Taleb, Ikbal [1 ]
Dssouli, Rachida [1 ]
Serhani, Mohamed Adel [2 ]
机构
[1] Concordia Univ, CIISE, Montreal, PQ, Canada
[2] UAE Univ, Coll Informat Technol, Al Ain, U Arab Emirates
关键词
Big Data; Data Quality; pre-processing; DATA PROVENANCE; CHALLENGES; MANAGEMENT; ANALYTICS;
D O I
10.1109/BigDataCongress.2015.35
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the abundance of raw data generated from various sources, Big Data has become a preeminent approach in acquiring, processing, and analyzing large amounts of heterogeneous data to derive valuable evidences. The size, speed, and formats in which data is generated and processed affect the overall quality of information. Therefore, Quality of Big Data (QBD) has become an important factor to ensure that the quality of data is maintained at all Big data processing phases. This paper addresses the QBD at the pre-processing phase, which includes sub-processes like cleansing, integration, filtering, and normalization. We propose a QBD model incorporating processes to support Data quality profile selection and adaptation. In addition, it tracks and registers on a data provenance repository the effect of every data transformation happened in the pre-processing phase. We evaluate the data quality selection module using large EEG dataset. The obtained results illustrate the importance of addressing QBD at an early phase of Big Data processing lifecycle since it significantly save on costs and perform accurate data analysis.
引用
收藏
页码:191 / 198
页数:8
相关论文
共 50 条
  • [1] An Enhanced Pre-Processing Model for Big Data Processing: A Quality Framework
    Lincy, Blessy Trencia S. S.
    Kumar, N. Suresh
    2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN GREEN ENERGY AND HEALTHCARE TECHNOLOGIES (IGEHT), 2017,
  • [2] Pre-Processing Data In Weather Monitoring Application By Using Big Data Quality Framework
    Labeeb, Kashshaf
    Chowdhury, Kuraish Bin Quader
    Riha, Rabea Basri
    Abedin, Mohammad Zoynul
    Yesmin, Sarmila
    Khan, Mohammad Nasfikur Rahman
    PROCEEDINGS OF 2020 6TH IEEE INTERNATIONAL WOMEN IN ENGINEERING (WIE) CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE 2020), 2020, : 292 - 295
  • [3] Big Data Pre-Processing: Closing the Data Quality Enforcement Loop
    Taleb, Ikbal
    Serhani, Mohamed Adel
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 498 - 501
  • [4] The application of data pre-processing technology in the geoscience big data
    Wang ChengBin
    Ma XiaoGang
    Chen JianGuo
    ACTA PETROLOGICA SINICA, 2018, 34 (02) : 303 - 313
  • [5] Survey of Pre-processing Techniques for Mining Big Data
    Hariharakrishnan, Jayaram
    Mohanavalli, S.
    Srividya
    Kumar, Sundhara K. B.
    2017 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND SIGNAL PROCESSING (ICCCSP), 2017, : 77 - 81
  • [6] A framework of irregularity enlightenment for data pre-processing in data mining
    Au, Siu-Tong
    Duan, Rong
    Hesar, Siamak G.
    Jiang, Wei
    ANNALS OF OPERATIONS RESEARCH, 2010, 174 (01) : 47 - 66
  • [7] A framework of irregularity enlightenment for data pre-processing in data mining
    Siu-Tong Au
    Rong Duan
    Siamak G. Hesar
    Wei Jiang
    Annals of Operations Research, 2010, 174 : 47 - 66
  • [8] Pre-processing and Indexing Techniques for Constellation Queries in Big Data
    Khatibi, Amir
    Porto, Fabio
    Rittmeyer, Joao Guilherme
    Ogasawara, Eduardo
    Valduriez, Patrick
    Shasha, Dennis
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 164 - 172
  • [9] Methods for pre-processing smartcard data to improve data quality
    Robinson, Steve
    Narayanan, Baskaran
    Toh, Nelson
    Pereira, Francisco
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2014, 49 : 43 - 58
  • [10] Pre-processing and quality assessment of crosshole georadar data
    Tronicke, J
    Dietrich, P
    Appel, E
    GPR 2000: PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON GROUND PENETRATING RADAR, 2000, 4084 : 579 - 583