Entity Resolution with Iterative Blocking

被引:0
|
作者
Whang, Steven Euijong [1 ]
Menestrina, David [1 ]
Koutrika, Georgia [1 ]
Theobald, Martin [1 ]
Garcia-Molina, Hector [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
entity resolution; blocking; iterative blocking;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity Resolution (ER) is the problem of identifying which records in a database refer to the same real-world entity. An exhaustive ER process involves computing the similarities between pairs of records, which can be very expensive for large datasets. Various blocking techniques can be used to enhance the performance of ER by dividing the records into blocks in multiple ways and only comparing records within the same block. However, most blocking techniques process blocks separately and do not exploit the results of other blocks. In this paper, we propose an iterative blocking framework where the ER results of blocks are reflected to subsequently processed blocks. Blocks are now iteratively processed until no block contains any more matching records. Compared to simple blocking, iterative blocking may achieve higher accuracy because reflecting the ER results of blocks to other blocks may generate additional record matches. Iterative blocking may also be more efficient because processing a block now saves the processing time for other blocks. We implement a scalable iterative blocking system and demonstrate that iterative blocking can be more accurate and efficient than blocking for large datasets.
引用
收藏
页码:219 / 231
页数:13
相关论文
共 50 条
  • [21] A Blocking Framework for Entity Resolution in Highly Heterogeneous Information Spaces
    Papadakis, George
    Ioannou, Ekaterini
    Palpanas, Themis
    Niederee, Claudia
    Nejdl, Wolfgang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (12) : 2665 - 2682
  • [22] The role of transitive closure in evaluating blocking methods for dirty entity resolution
    Niknam, Mahdi
    Minaei-Bidgoli, Behrouz
    Dianat, Rouhollah
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2022, 58 (03) : 561 - 590
  • [23] The role of transitive closure in evaluating blocking methods for dirty entity resolution
    Mahdi Niknam
    Behrouz Minaei-Bidgoli
    Rouhollah Dianat
    Journal of Intelligent Information Systems, 2022, 58 : 561 - 590
  • [24] Unsupervised Blocking Key Selection for Real-Time Entity Resolution
    Ramadan, Banda
    Christen, Peter
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, 2015, 9078 : 574 - 585
  • [25] Unsupervised learning blocking keys technique for indexing Arabic entity resolution
    Alian, Marwah
    Awajan, Arafat
    Ramadan, Bandan
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 621 - 628
  • [26] Unsupervised learning blocking keys technique for indexing Arabic entity resolution
    Marwah Alian
    Arafat Awajan
    Bandan Ramadan
    International Journal of Speech Technology, 2019, 22 : 621 - 628
  • [27] A Noise Tolerant and Schema-agnostic Blocking Technique for Entity Resolution
    Araujo, Tiago Brasileiro
    Santos Pires, Carlos Eduardo
    Mestre, Demetrio Gomes
    da Nobrega, Thiago Pereira
    do Nascimento, Dimas Cassimiro
    Stefanidis, Kostas
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 422 - 430
  • [28] DeepBlock: A Novel Blocking Approach for Entity Resolution using Deep Learning
    Javdani, Delaram
    Rahmani, Hossein
    Allahgholi, Milad
    Karimkhani, Fatemeh
    2019 5TH INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2019, : 41 - 44
  • [29] Detective Gadget: Generic Iterative Entity Resolution over Dirty Data
    Buoncristiano, Marcello
    Mecca, Giansalvatore
    Santoro, Donatello
    Veltri, Enzo
    DATA, 2024, 9 (12)
  • [30] GSM: A generalized approach to Supervised Meta-blocking for scalable entity resolution
    Gagliardelli, Luca
    Papadakis, George
    Simonini, Giovanni
    Bergamaschi, Sonia
    Palpanas, Themis
    INFORMATION SYSTEMS, 2024, 120