An Efficient and Robust Approach for Discovering Data Quality Rules

被引：11

作者：

Yeh, Peter Z. ^{[1
]}

Puri, Colin A. ^{[1
]}

机构：

[1] Accenture Technol Labs, San Jose, CA USA

来源：

22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 1 | 2010年

关键词：

D O I：

10.1109/ICTAI.2010.43

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Poor quality data is a growing problem that affects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. Moreover, this problem is costly to fix because significant effort and resources are required to identify a comprehensive set of rules that can detect (and correct) data defects along various data quality dimensions such as consistency, conformity, and more. Hence, many organizations employ only basic data quality rules that check for null values, format, etc. in efforts such as data profiling and data cleansing; and ignore rules that are needed to detect deeper problems such as inconsistent values across interdependent attributes. This oversight can lead to numerous problems such as inaccurate reporting of key metrics used to inform critical decisions or derive business insights. In this paper, we present an approach that efficiently and robustly discovers data quality rules - in particular conditional functional dependencies - for detecting inconsistencies in data and hence improves data quality along the critical dimension of consistency. We evaluate our approach empirically on several real-world data sets. We show that our approach performs well on these data sets for metrics such as precision and recall. We also compare our approach to an established solution and show that our approach outperforms this solution for the same metrics. Finally, we show that our approach scales efficiently with the number of records, the number of attributes, and the domain size.

引用

页数：8

共 50 条

[41] An Efficient Approach to Discovering Frequent Patterns from Data Cube using Aggregation and Directed Graph
Singh, Kuldeep
Shakya, Harish Kumar
Biswas, Bhaskar
6TH INTERNATIONAL CONFERENCE ON COMPUTER & COMMUNICATION TECHNOLOGY (ICCCT-2015), 2015, : 31 - 35
[42] An efficient approach to categorising association rules
Won, Dongwoo
McLeod, Dennis
INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2012, 4 (04) : 309 - 333
[43] A graph-based approach for discovering various types of association rules
Yen, SJ
Chen, ALP
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2001, 13 (05) : 839 - 845
[44] Discovering rules to design newspapers: An inductive constraint logic programming approach
Bernard, M
Jacquenet, F
APPLIED ARTIFICIAL INTELLIGENCE, 1998, 12 (06) : 547 - 567
[45] Discovering Dispathcing Rules for Job Shop Schdeuling Using Data Mining
Balasundaram, R.
Baskar, N.
Sankar, R. Siva
ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, VOL 3, 2013, 178 : 63 - +
[46] Discovering Rules with Genetic Algorithms to Classify Urban Remotely Sensed Data
Sheeren, D.
Quirin, A.
Puissant, A.
Gancarski, P.
Weber, C.
2006 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-8, 2006, : 3919 - +
[47] Fuzzy data mining for discovering changes in association rules over time
Au, WH
Chan, KCC
PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOL 1 & 2, 2002, : 890 - 895
[48] Robust analysis and optimization of a novel efficient quality assurance model in data warehousing
Amuthabala, P.
Santhosh, R.
COMPUTERS & ELECTRICAL ENGINEERING, 2019, 74 : 233 - 244
[49] An efficient approach for discovering Graph Entity Dependencies (GEDs)
Liu, Dehua
Kwashie, Selasi
Zhang, Yidi
Zhou, Guangtong
Bewong, Michael
Wu, Xiaoying
Guo, Xi
He, Keqing
Feng, Zaiwen
INFORMATION SYSTEMS, 2024, 125
[50] An Efficient Approach to Discovering Sequential Patterns in Large Databases
Yen, Show-Jane
Cho, Chung-Wen
LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 685 - 690

← 1 2 3 4 5 →