Duplicate Detection Exploiting Data Relationships

被引:0
|
作者
Herschel, Melanie [1 ]
机构
[1] Univ Tubingen, Wilhelm Schickard Inst Informat, Lehrstuhl Datenbanksyst, Sand 13, D-72076 Tubingen, Germany
来源
IT-INFORMATION TECHNOLOGY | 2009年 / 51卷 / 04期
关键词
H.2 [Information Systems: Database Management; H.2.5 [Information Systems: Database Management: Heterogeneous Databases; dublication detection; algorithms; performance; data quality; data integration;
D O I
10.1524/itit.2009.0546
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Duplicate detection consists in identifying multiple, different data base representations of a same real-world object. State-of-the-art duplicate detection systems usually concentrate on identifying duplicates in a single relational table and thereby ignore that the data may exist in a larger context that, when considered, can significantly improve the performance of duplicate detection. In this paper, we present algorithms that exploit relationships that exist in the data.
引用
收藏
页码:231 / 234
页数:4
相关论文
共 50 条
  • [1] Data Duplicate Detection
    Medidar, Nikita
    Chavan, Manik
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [2] Data Preparation for Duplicate Detection
    Koumarelas, Ioannis
    Jiang, Lan
    Naumann, Felix
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2020, 12 (03):
  • [3] Duplicate Detection in Probabilistic Data
    Panse, Fabian
    van Keulen, Maurice
    de Keijzer, Ander
    Ritter, Norbert
    2010 IEEE 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDE 2010), 2010, : 179 - 182
  • [4] Duplicate Data Detection Using GNN
    Lu, Hanrong
    Chen, Xin
    Lan, Xuhui
    Zheng, Feng
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 167 - 170
  • [5] NEAR-DUPLICATE VIDEO DETECTION EXPLOITING NOISE RESIDUAL TRACES
    Lameri, Silvia
    Bondi, Luca
    Bestagini, Paolo
    Tubaro, Stefano
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1497 - 1501
  • [6] A Survey Analysis On Duplicate Detection in Hierarchical Data
    Gaikwad, Shital
    Bogiri, Nagaraju
    2015 INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING (ICPC), 2015,
  • [7] Exploiting Sentence-Level Features for Near-Duplicate Document Detection
    Wang, Jenq-Haur
    Chang, Hung-Chi
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 205 - +
  • [8] Efficient and Effective Duplicate Detection in Hierarchical Data
    Leitao, Luis
    Calado, Pavel
    Herschel, Melanie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (05) : 1028 - 1041
  • [9] FINGERPRINT BASED DUPLICATE DETECTION IN STREAMED DATA
    Singh, Amritpal
    Batra, Shalini
    COMPUTING AND INFORMATICS, 2018, 37 (06) : 1313 - 1338
  • [10] Efficient duplicate records detection method for massive data
    Pang, Xiongwen
    Yao, Zhanlin
    Li, Yongjun
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2010, 38 (02): : 8 - 11