Near Duplicate Detection in Relational Databases

被引:0
|
作者
Bayrak, Ahmet Tugrul [1 ]
Yilmaz, Aykut Inan [1 ]
Yilmaz, Kemal Burak [1 ]
Duzagac, Remzi [1 ]
Yildiz, Olcay Taner [2 ]
机构
[1] ETSTUR, Veri Bilimi & Analit Bolumu, Istanbul, Turkey
[2] Isik Univ, Bilgisayar Muhendisligi Bolumu, Sile Istanbul, Turkey
关键词
Machine Learning; Similarity Functions; Duplicate Record Detection;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
While data amount increases, number of duplicate records in relational databases increase gradually. The duplicate records might cause inconsistency on reports and analyzes. To reduce the effects of this problem, we aim to detect duplicate records using machine learning algorithms with features that are produced by similarity of the records. We achieved to detect 28412 duplicate records in 9301467 records. The detected duplicate rows are removed from the data source and the data become more consistent.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] Result-Based Detection of Insider Threats to Relational Databases
    Sallam, Asmaa
    Bertino, Elisa
    PROCEEDINGS OF THE NINTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY (CODASPY '19), 2019, : 133 - 143
  • [42] Efficient Near-Duplicate Document Detection using FPGAs
    Luo, Xi
    Najjar, Walid
    Hristidis, Vagelis
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [43] Codebook-Based Near-Duplicate Video Detection
    Hernandez, Guillermo
    Gonzalez Arrieta, Angelica
    Novais, Paulo
    Rodriguez, Sara
    16TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2021), 2022, 1401 : 283 - 293
  • [44] Combination of Local and Global Features for Near-Duplicate Detection
    Wang, Yue
    Hou, ZuJun
    Leman, Karianto
    Nam Trung Pham
    Chua, TeckWee
    Chang, Richard
    ADVANCES IN MULTIMEDIA MODELING, PT I, 2011, 6523 : 328 - 338
  • [45] OBJECT DATABASES AS GENERALIZATIONS OF RELATIONAL DATABASES
    BEECH, D
    OZBUTUN, C
    COMPUTER STANDARDS & INTERFACES, 1991, 13 (1-3) : 221 - 230
  • [46] A Query Inversion Technique for Detection of Unexpected Values in Relational Databases
    Uddin, Md Salah
    Alexandrov, Dmitry, V
    INTELLIGENT SYSTEMS AND APPLICATIONS, INTELLISYS, VOL 2, 2019, 869 : 1 - 19
  • [47] A Different Approach of Intrusion Detection and Response System for Relational Databases
    Parmar, Jitendra
    Jain, Pranita
    2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 894 - 899
  • [48] An Efficient Approach to Web Near-Duplicate Image Detection
    Li, Jun
    Thou, Shan
    Xing, Liang
    Sun, Changyin
    Hu, Weiming
    2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 186 - 190
  • [49] Near-Duplicate Detection Based on Text Coherence Quantification
    D'hondt, Joris
    Verhaegen, Paul-Armand
    Vertommen, Joris
    Cattrysse, Dirk
    Duflou, Joost
    PROCEEDINGS OF THE 10TH EUROPEAN CONFERENCE ON KNOWLEDGE MANAGEMENT , VOLS 1 AND 2, 2009, : 238 - 246
  • [50] Near-duplicate document detection with improved similarity measurement
    袁鑫攀
    龙军
    张祖平
    桂卫华
    JournalofCentralSouthUniversity, 2012, 19 (08) : 2231 - 2237