An Efficient and Robust Approach for Discovering Data Quality Rules

被引:11
|
作者
Yeh, Peter Z. [1 ]
Puri, Colin A. [1 ]
机构
[1] Accenture Technol Labs, San Jose, CA USA
关键词
D O I
10.1109/ICTAI.2010.43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Poor quality data is a growing problem that affects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. Moreover, this problem is costly to fix because significant effort and resources are required to identify a comprehensive set of rules that can detect (and correct) data defects along various data quality dimensions such as consistency, conformity, and more. Hence, many organizations employ only basic data quality rules that check for null values, format, etc. in efforts such as data profiling and data cleansing; and ignore rules that are needed to detect deeper problems such as inconsistent values across interdependent attributes. This oversight can lead to numerous problems such as inaccurate reporting of key metrics used to inform critical decisions or derive business insights. In this paper, we present an approach that efficiently and robustly discovers data quality rules - in particular conditional functional dependencies - for detecting inconsistencies in data and hence improves data quality along the critical dimension of consistency. We evaluate our approach empirically on several real-world data sets. We show that our approach performs well on these data sets for metrics such as precision and recall. We also compare our approach to an established solution and show that our approach outperforms this solution for the same metrics. Finally, we show that our approach scales efficiently with the number of records, the number of attributes, and the domain size.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] An Efficient Approach to Discovering Frequent Patterns from Data Cube using Aggregation and Directed Graph
    Singh, Kuldeep
    Shakya, Harish Kumar
    Biswas, Bhaskar
    6TH INTERNATIONAL CONFERENCE ON COMPUTER & COMMUNICATION TECHNOLOGY (ICCCT-2015), 2015, : 31 - 35
  • [42] An efficient approach to categorising association rules
    Won, Dongwoo
    McLeod, Dennis
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2012, 4 (04) : 309 - 333
  • [43] A graph-based approach for discovering various types of association rules
    Yen, SJ
    Chen, ALP
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2001, 13 (05) : 839 - 845
  • [44] Discovering rules to design newspapers: An inductive constraint logic programming approach
    Bernard, M
    Jacquenet, F
    APPLIED ARTIFICIAL INTELLIGENCE, 1998, 12 (06) : 547 - 567
  • [45] Discovering Dispathcing Rules for Job Shop Schdeuling Using Data Mining
    Balasundaram, R.
    Baskar, N.
    Sankar, R. Siva
    ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, VOL 3, 2013, 178 : 63 - +
  • [46] Discovering Rules with Genetic Algorithms to Classify Urban Remotely Sensed Data
    Sheeren, D.
    Quirin, A.
    Puissant, A.
    Gancarski, P.
    Weber, C.
    2006 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-8, 2006, : 3919 - +
  • [47] Fuzzy data mining for discovering changes in association rules over time
    Au, WH
    Chan, KCC
    PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOL 1 & 2, 2002, : 890 - 895
  • [48] Robust analysis and optimization of a novel efficient quality assurance model in data warehousing
    Amuthabala, P.
    Santhosh, R.
    COMPUTERS & ELECTRICAL ENGINEERING, 2019, 74 : 233 - 244
  • [49] An efficient approach for discovering Graph Entity Dependencies (GEDs)
    Liu, Dehua
    Kwashie, Selasi
    Zhang, Yidi
    Zhou, Guangtong
    Bewong, Michael
    Wu, Xiaoying
    Guo, Xi
    He, Keqing
    Feng, Zaiwen
    INFORMATION SYSTEMS, 2024, 125
  • [50] An Efficient Approach to Discovering Sequential Patterns in Large Databases
    Yen, Show-Jane
    Cho, Chung-Wen
    LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 685 - 690