Near Duplicate Detection in Relational Databases

被引：0

作者：

Bayrak, Ahmet Tugrul ^{[1
]}

Yilmaz, Aykut Inan ^{[1
]}

Yilmaz, Kemal Burak ^{[1
]}

Duzagac, Remzi ^{[1
]}

Yildiz, Olcay Taner ^{[2
]}

机构：

[1] ETSTUR, Veri Bilimi & Analit Bolumu, Istanbul, Turkey

[2] Isik Univ, Bilgisayar Muhendisligi Bolumu, Sile Istanbul, Turkey

来源：

2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU) | 2018年

关键词：

Machine Learning; Similarity Functions; Duplicate Record Detection;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

While data amount increases, number of duplicate records in relational databases increase gradually. The duplicate records might cause inconsistency on reports and analyzes. To reduce the effects of this problem, we aim to detect duplicate records using machine learning algorithms with features that are produced by similarity of the records. We achieved to detect 28412 duplicate records in 9301467 records. The detected duplicate rows are removed from the data source and the data become more consistent.

引用

页数：4

共 50 条

[1] Different Similarity Measures to Identify Duplicate Records in Relational Databases
Hadzic, Dulaga
Sarajlic, Nermin
Malkic, Jasmin
2016 24TH TELECOMMUNICATIONS FORUM (TELFOR), 2016, : 790 - 793
[2] Seagull optimization-based near-duplicate image detection in large image databases
Sundaram, Srinidhi
Kamalakkannan, S.
Jayaraman, Sasikala
IMAGING SCIENCE JOURNAL, 2023, 71 (07): : 647 - 659
[3] Concentric Circle-Based Image Signature for Near-Duplicate Detection in Large Databases
Cho, Ayoung
Yang, Won-Keun
Oh, Weon-Geun
Jeong, Dong-Seok
ETRI JOURNAL, 2010, 32 (06) : 871 - 880
[4] Benchmarks for measurement of duplicate detection methods in nucleotide databases
Chen, Qingyu
Zobel, Justin
Verspoor, Karin
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2023, 2023
[5] An incremental clustering scheme for duplicate detection in large databases
Cesario, E
Folino, F
Manco, G
Pontieri, L
9TH INTERNATIONAL DATABASE ENGINEERING & APPLICATION SYMPOSIUM, PROCEEDINGS, 2005, : 89 - 95
[6] Automating Duplicate Detection for Lexical Heterogeneous Web Databases
Ahlawat A.
Sagar K.
Recent Advances in Computer Science and Communications, 2022, 15 (04) : 540 - 549
[7] Effective incremental clustering for duplicate detection in large databases
Folino, Francesco
Manco, Giuseppe
Pontieri, Luigi
10TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2006, : 45 - 52
[8] Near Duplicate Detection using MapReduce
Du, Qinsheng
Liu, Wei
Li, Guolin
Tang, Yonglin
PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 243 - 246
[9] A constrained clustering approach to duplicate detection among relational data
Wang, Chao
Lu, Jie
Zhang, Guangquan
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 308 - +
[10] A Novel Method for Intrusion Detection in Relational Databases
Ramachandran, Raji
Arya, P.
Jayanthy, P. G.
2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 230 - 235

← 1 2 3 4 5 →