Quality-Based Online Data Reconciliation

被引:2
|
作者
Abboura, Asma [1 ]
Sahri, Soror [2 ]
Baba-Hamed, Latifa [1 ]
Ouziri, Mourad [2 ]
Benbernou, Salima [2 ]
机构
[1] Univ Oran 1, Oran, Algeria
[2] Univ Paris 05, Sorbonnes Paris Cite, LIPADE Lab, Paris, France
关键词
Duplicates; data reconciliation; data quality rules; source quality;
D O I
10.1145/2806888
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the main challenges in data matching and data cleaning, in highly integrated systems, is duplicates detection. While the literature abounds of approaches detecting duplicates corresponding to the same real world entity, most of these approaches tend to eliminate duplicates (wrong information) from the sources, hence leading to what is called data repair. In this article, we propose a framework that automatically detects duplicates at query time and effectively identifies the consistent version of the data, while keeping inconsistent data in the sources. Our framework uses matching dependencies (MDs) to detect duplicates through the concept of data reconciliation rules (DRR) and conditional function dependencies (CFDs) to assess the quality of different attribute values. We also build a duplicate reconciliation index (DRI), based on clusters of duplicates detected by a set of DRRs to speed up the online data reconciliation process. Our experiments of a real-world data collection show the efficiency and effectiveness of our framework.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] LQCC: A Link Quality-based Congestion Control Scheme in Named Data Networks
    Khelifi, Hakima
    Luo, Senlin
    Nourz, Boubakr
    Moungla, Hassine
    2019 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2019,
  • [42] Pyrethroid epidemiology: a quality-based review
    Burns, Carol J.
    Pastoor, Timothy P.
    CRITICAL REVIEWS IN TOXICOLOGY, 2018, 48 (04) : 297 - 311
  • [43] Content Identification and Quality-Based Ranking
    Nickel, C. (alexander.nouak@igd.fraunhofer.de), 1600, Springer Verlag (39):
  • [44] Quality-based purchasing in health care
    Waters, HR
    Morlock, LL
    Hatt, L
    INTERNATIONAL JOURNAL OF HEALTH PLANNING AND MANAGEMENT, 2004, 19 (04): : 365 - 381
  • [45] Online Data Reconciliation with Poor Redundancy Systems
    Manenti, Flavio
    Grottoli, Maria Grazia
    Pierucci, Sauro
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2011, 50 (24) : 14105 - 14114
  • [46] Empirical Validation of WebQMDW Model for Quality-based External Web Data Source Incorporation in a Data Warehouse
    Bhutani, Priyanka
    Saha, Anju
    Gosain, Anjana
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 206 - 215
  • [47] Quality-based Heuristic for Optimal Product Derivation in Software Product Lines Quality-based Optimal Product Derivation in SPL
    Losavio, Francisca
    Ordaz, Oscar
    2015 Internet Technologies and Applications (ITA) Proceedings of the Sixth International Conference (ITA 15), 2015, : 125 - 131
  • [48] Quality-based approach to urgent workflows scheduling
    Butakov, Nikolay
    Nasonov, Denis
    Svitenkov, Andrey
    Radice, Anton
    Boukhanovsky, Alexander
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE 2016 (ICCS 2016), 2016, 80 : 2074 - 2085
  • [49] Quality-based control for drying food materials
    Davidson, VJ
    Martineau, S
    Brown, BR
    1996 BIENNIAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1996, : 114 - 117
  • [50] Quality-based Association Rules for Stock Index Data by using Rough Set Theory
    Utthammajai, Krittithee
    Leesutthipornchai, Pakorn
    2015 12TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING/ELECTRONICS, COMPUTER, TELECOMMUNICATIONS AND INFORMATION TECHNOLOGY (ECTI-CON), 2015,