Discovering and reconciling value conflicts for numerical data integration

被引:39
|
作者
Fan, WG
Lu, HJ
Madnick, SE
Cheung, D
机构
[1] Univ Michigan, Sch Business, Dept Comp & Informat Syst, Ann Arbor, MI 48109 USA
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
[3] Peking Univ, China Natl Lab Machine Percept, Beijing, Peoples R China
[4] MIT, Alfred P Sloan Sch Management, Cambridge, MA 02139 USA
关键词
data integration; data mining; semantic conflicts; robust regression; data quality; conversion function;
D O I
10.1016/S0306-4379(01)00043-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The built-up in Information Technology capital fueled by the Internet and cost-effectiveness of new telecommunications technologies has led to a proliferation of information systems that are in dire need to exchange information but incapable of doing so due to the lack of semantic interoperability. It is now evident that physical connectivity (the ability to exchange bits and bytes) is no longer adequate: the integration of data from autonomous and heterogeneous systems calls for the prior identification and resolution of semantic conflicts that may be present. Unfortunately, this requires the system integrator to sift through the data from disparate systems in a painstaking manner. We suggest that this process can be partially automated by presenting a methodology and technique for the discovery of potential semantic conflicts as well as the underlying data transformation needed to resolve the conflicts. Our methodology begins by classifying data value conflicts into two categories: context independent and context dependent. While context independent conflicts are usually caused by unexpected errors, the context dependent conflicts are primarily a result of the heterogeneity of underlying data sources. To facilitate data integration, data value conversion rules are proposed to describe the quantitative relationships among data values involving context dependent conflicts. A general approach is proposed to discover data value conversion rules from the data. The approach consists of the five major steps: relevant attribute analysis, candidate model selection, conversion function generation, conversion function selection and conversion rule formation. It is being implemented in a prototype system, DIRECT, for business data using statistics based techniques. Preliminary study using both synthetic and real world data indicated that the proposed approach is promising. (C) 2001 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:635 / 656
页数:22
相关论文
共 50 条
  • [1] Reconciling inconsistent data in probabilistic XML data integration
    Pankowski, Tadeusz
    SHARING DATA, INFORMATION AND KNOWLEDGE, PROCEEDINGS, 2008, 5071 : 75 - 86
  • [2] Data Fusion - Resolving Data Conflicts for Integration
    Dong, Xin Luna
    Naumann, Felix
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02): : 1654 - 1655
  • [3] Reconciling resource integration and value propositions - the dynamics of value co-creation
    Siltaloppi, Jaakko
    Vargo, Stephen L.
    2014 47TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2014, : 1278 - 1284
  • [4] Discovering Conflicts of Interest across Heterogeneous Data Sources with ConnectionLens
    Anadiotis, Angelos-Christos
    Balalau, Oana
    Bouganim, Theo
    Chimienti, Francesco
    Galhardas, Helena
    Haddad, Mhd-Yamen
    Horel, Stephane
    Manolescu, Ioana
    Youssef, Youssr
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 4670 - 4674
  • [5] Learning by discovering conflicts
    Lashkia, GV
    Anthony, L
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 492 - 497
  • [6] Discovering transcriptional modules by Bayesian data integration
    Savage, Richard S.
    Ghahramani, Zoubin
    Griffin, Jim E.
    de la Cruz, Bernard J.
    Wild, David L.
    BIOINFORMATICS, 2010, 26 (12) : i158 - i167
  • [7] Spatial Data Integration and Conflicts Resolving Approaches
    Wang Yu-hong
    Hu Sheng-wu
    2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING, VOL 1, PROCEEDINGS, 2009, : 355 - 358
  • [8] Heterogeneous data-integration and data quality: Overview of conflicts
    Boufares, F.
    Ben Salem, A.
    2012 6TH INTERNATIONAL CONFERENCE ON SCIENCES OF ELECTRONICS, TECHNOLOGIES OF INFORMATION AND TELECOMMUNICATIONS (SETIT), 2012, : 867 - 874
  • [9] Reconciling European Conflicts and Insolvency Law
    McCormack, Gerard
    EUROPEAN BUSINESS ORGANIZATION LAW REVIEW, 2014, 15 (03) : 309 - 336
  • [10] Reconciling European Conflicts and Insolvency Law
    Gerard McCormack
    European Business Organization Law Review, 2014, 15 : 309 - 336