A FRAMEWORK FOR DATA CLEANING IN DATA WAREHOUSES

被引:0
|
作者
Peng, Taoxin [1 ]
机构
[1] Napier Univ, Sch Comp, Edinburgh EH10 5DT, Midlothian, Scotland
关键词
Data Cleaning; Data Quality; Data Integration; Data Warehousing;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when performing data cleaning? This paper challenges these two questions by presenting a novel framework, which provides an approach to managing data cleaning in data warehouses by focusing on the use of data quality dimensions, and decoupling a cleaning process into several sub-processes. Initial test run of the processes in the framework demonstrates that the approach presented is efficient and scalable for data cleaning in data warehouses.
引用
收藏
页码:473 / 478
页数:6
相关论文
共 50 条
  • [41] A Hybrid Model Driven Development Framework for the Multidimensional Modeling of Data Warehouses
    Mazon, Jose-Norberto
    Trujillo, Juan
    SIGMOD RECORD, 2009, 38 (02) : 12 - 17
  • [42] A Framework for Investigating the Performance of Sum Aggregations over Encrypted Data Warehouses
    Lopes, Claudivan Cruz
    Times, Valeria Cesario
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 1000 - 1007
  • [43] FIF: A NLP-based Feature Identification Framework for Data Warehouses
    Chouhan, Ashish
    Prabhune, Ajinkya
    2019 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2019), 2019, : 276 - 281
  • [44] A game theory based framework for materialized view selection in data warehouses
    Azgomi, Hossein
    Sohrabi, Mohammad Karim
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 71 : 125 - 137
  • [45] The medical data in the knowledge : warehouses and searches of data
    Garcelon, N.
    ANNALES DE DERMATOLOGIE ET DE VENEREOLOGIE, 2015, 142 (12): : S389 - S390
  • [46] Building data warehouses with semantic web data
    Nebot, Victoria
    Berlanga, Rafael
    DECISION SUPPORT SYSTEMS, 2012, 52 (04) : 853 - 868
  • [47] Data Warehouses Federation as a Single Data Warehouse
    Kern, Rafal
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2016, PT I, 2016, 9875 : 356 - 366
  • [48] Integrating data warehouses with web data:: A survey
    Manuel Perez, Juan
    Berlanga, Rafael
    Jose Aramburu, Maria
    Pedersen, Torben Bach
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (07) : 940 - 955
  • [49] A data cleaning model for electric power big data based on Spark framework
    Qu, Zhao-Yang
    Wang, Yong-Wen
    Wang, Chong
    Qu, Nan
    Yan, Jia
    International Journal of Database Theory and Application, 2016, 9 (03): : 137 - 150
  • [50] A linear programming-based framework for handling missing data in multi-granular data warehouses
    Bimonte, Sandro
    Ren, Libo
    Koueya, Nestor
    DATA & KNOWLEDGE ENGINEERING, 2020, 128