Duplicate Detection Exploiting Data Relationships

被引:0
|
作者
Herschel, Melanie [1 ]
机构
[1] Univ Tubingen, Wilhelm Schickard Inst Informat, Lehrstuhl Datenbanksyst, Sand 13, D-72076 Tubingen, Germany
来源
IT-INFORMATION TECHNOLOGY | 2009年 / 51卷 / 04期
关键词
H.2 [Information Systems: Database Management; H.2.5 [Information Systems: Database Management: Heterogeneous Databases; dublication detection; algorithms; performance; data quality; data integration;
D O I
10.1524/itit.2009.0546
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Duplicate detection consists in identifying multiple, different data base representations of a same real-world object. State-of-the-art duplicate detection systems usually concentrate on identifying duplicates in a single relational table and thereby ignore that the data may exist in a larger context that, when considered, can significantly improve the performance of duplicate detection. In this paper, we present algorithms that exploit relationships that exist in the data.
引用
收藏
页码:231 / 234
页数:4
相关论文
共 50 条
  • [21] Duplicate document detection
    Spitz, AL
    DOCUMENT RECOGNITION IV, 1997, 3027 : 88 - 94
  • [22] Active Duplicate Detection
    Deng, Ke
    Wang, Liwei
    Zhou, Xiaofang
    Sadiq, Shazia
    Fung, Gabriel Pui Cheong
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT I, PROCEEDINGS, 2010, 5981 : 565 - +
  • [23] Progressive Duplicate Detection
    Papenbrock, Thorsten
    Heise, Arvid
    Naumann, Felix
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (05) : 1316 - 1329
  • [24] Improved Streaming Quotient Filter: A Duplicate Detection Approach for Data Streams
    Che, Shiwei
    Yang, Wu
    Wang, Wei
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2020, 17 (05) : 769 - 777
  • [25] A Similar Duplicate Record Detection Algorithm for Big Data Based on MapReduce
    Song R.
    Yu T.
    Chen Y.
    Chen Y.
    Xia B.
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2018, 52 (02): : 214 - 221
  • [26] BigDedup: A Big Data Integration Toolkit for Duplicate Detection in Industrial Scenarios
    Gagliardelli, Luca
    Zhu, Song
    Simonini, Giovanni
    Bergamaschi, Sonia
    TRANSDISCIPLINARY ENGINEERING METHODS FOR SOCIAL INNOVATION OF INDUSTRY 4.0, 2018, 7 : 1015 - 1023
  • [27] DUPLICATE DETECTION AND DELETION IN THE EXTENDED NF2 DATA MODEL
    KUSPERT, K
    SAAKE, G
    WEGNER, L
    LECTURE NOTES IN COMPUTER SCIENCE, 1989, 367 : 83 - 100
  • [28] Duplicate publication of data
    Engelhardt, JB
    Toseland, RW
    ODonnell, JC
    Richie, JT
    Jue, D
    Banks, S
    JOURNAL OF THE AMERICAN GERIATRICS SOCIETY, 1997, 45 (02) : 250 - 250
  • [29] Exploiting the relationships among several binary classifiers via data transformation
    Toh, Kar-Ann
    Tan, Geok-Choo
    PATTERN RECOGNITION, 2014, 47 (03) : 1509 - 1522
  • [30] Exploiting Unlabeled Data for Neural Grammatical Error Detection
    Zhuo-Ran Liu
    Yang Liu
    Journal of Computer Science and Technology, 2017, 32 : 758 - 767