The Interaction Between Schema Matching and Record Matching in Data Integration

被引:14
|
作者
Gu, Binbin [1 ]
Li, Zhixu [1 ]
Zhang, Xiangliang [2 ]
Liu, An [1 ]
Liu, Guanfeng [1 ]
Zheng, Kai [1 ]
Zhao, Lei [1 ]
Zhou, Xiaofang [1 ,3 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Jiangsu, Peoples R China
[2] King Abdullah Univ Sci & Technol, Jeddah 239556900, Thuwal, Saudi Arabia
[3] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld 4072, Australia
基金
澳大利亚研究理事会;
关键词
Data integration; schema matching; record matching; LINKAGE;
D O I
10.1109/TKDE.2016.2611577
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Schema Matching (SM) and Record Matching (RM) are two necessary steps in integrating multiple relational tables of different schemas, where SM unifies the schemas and RM detects records referring to the same real-world entity. The two processes have been thoroughly studied separately, but few attention has been paid to the interaction of SM and RM. In this work, we find that, even alternating them in a simple manner, SM and RM can benefit from each other to reach a better integration performance (i.e., in terms of precision and recall). Therefore, combining SM and RM is a promising solution for improving data integration. To this end, we define novel matching rules for SM and RM, respectively, that is, every SM decision is made based on intermediate RM results, and vice versa, such that SM and RM can be performed alternately. The quality of integration is guaranteed by a Matching Likelihood Estimation model and the control of semantic drift, which prevent the effect of mismatch magnification. To reduce the computational cost, we design an index structure based on q-grams and a greedy search algorithm that can reduce around 90 percent overhead of the interaction. Extensive experiments on three data collections show that the combination and interaction between SM and RM significantly outperforms previous works that conduct SM and RM separately.
引用
收藏
页码:186 / 199
页数:14
相关论文
共 50 条
  • [41] Schema matching using duplicates
    Bilke, A
    Naumann, F
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 69 - 80
  • [42] Comparison of schema matching evaluations
    Do, HH
    Melnik, S
    Rahm, E
    WEB, WEB-SERVICES, AND DATABASE SYSTEMS, 2003, 2593 : 221 - 237
  • [43] On the Impact of sameAs on Schema Matching
    Raad, Joe
    Acar, Erman
    Schlobach, Stefan
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE (K-CAP '19), 2019, : 77 - 84
  • [44] Collaborative Schema Matching Reconciliation
    Hung Quoc Viet Nguyen
    Xuan Hoai Luong
    Miklos, Zoltan
    Tho Thanh Quan
    Aberer, Karl
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2013 CONFERENCES, 2013, 8185 : 222 - 240
  • [45] Understanding the schema matching problem
    Algergawy, Alsayed
    Schallehn, Eike
    Saake, Gunter
    PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED COMPUTER SCIENCE: COMPUTER SCIENCE CHALLENGES, 2007, : 59 - +
  • [46] Performance oriented schema matching
    Saleem, Khalid
    Bellahsene, Zohra
    Hunt, Ela
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 844 - +
  • [47] Background Knowledge in Schema Matching: Strategy vs. Data
    Portisch, Jan
    Hladik, Michael
    Paulheim, Heiko
    SEMANTIC WEB - ISWC 2021, 2021, 12922 : 287 - 303
  • [48] Large Database Schema Matching using Data Mining Techniques
    Reis, Debora G.
    Ladeira, Marcelo
    Holanda, Maristela
    Victorino, Marcio C.
    2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 523 - 530
  • [49] HSMA: Hierarchical Schema Matching Algorithm for IoT Heterogeneous Data
    Guo S.
    Guo Z.
    Qiu Z.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2018, 55 (11): : 2522 - 2531
  • [50] Data Vault Mappings to Dimensional Model Using Schema Matching
    Puonti, Mikko
    Raitalaakso, Timo
    RESEARCH AND PRACTICAL ISSUES OF ENTERPRISE INFORMATION SYSTEMS, CONFENIS 2019, 2019, 375 : 55 - 64