Entity Resolution with Iterative Blocking

被引:0
|
作者
Whang, Steven Euijong [1 ]
Menestrina, David [1 ]
Koutrika, Georgia [1 ]
Theobald, Martin [1 ]
Garcia-Molina, Hector [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
entity resolution; blocking; iterative blocking;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity Resolution (ER) is the problem of identifying which records in a database refer to the same real-world entity. An exhaustive ER process involves computing the similarities between pairs of records, which can be very expensive for large datasets. Various blocking techniques can be used to enhance the performance of ER by dividing the records into blocks in multiple ways and only comparing records within the same block. However, most blocking techniques process blocks separately and do not exploit the results of other blocks. In this paper, we propose an iterative blocking framework where the ER results of blocks are reflected to subsequently processed blocks. Blocks are now iteratively processed until no block contains any more matching records. Compared to simple blocking, iterative blocking may achieve higher accuracy because reflecting the ER results of blocks to other blocks may generate additional record matches. Iterative blocking may also be more efficient because processing a block now saves the processing time for other blocks. We implement a scalable iterative blocking system and demonstrate that iterative blocking can be more accurate and efficient than blocking for large datasets.
引用
收藏
页码:219 / 231
页数:13
相关论文
共 50 条
  • [11] Unsupervised Entity Resolution With Blocking and Graph Algorithms
    Zhang, Dongxiang
    Li, Dongsheng
    Guo, Long
    Tan, Kian-Lee
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (03) : 1501 - 1515
  • [12] Efficient Spectral Neighborhood Blocking for Entity Resolution
    Shu, Liangcai
    Chen, Aiyou
    Xiong, Ming
    Meng, Weiyi
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 1067 - 1078
  • [13] Blocking and Filtering Techniques for Entity Resolution: A Survey
    Papadakis, George
    Skoutas, Dimitrios
    Thanos, Emmanouil
    Palpanas, Themis
    ACM COMPUTING SURVEYS, 2020, 53 (02)
  • [14] Semantic-Aware Blocking for Entity Resolution
    Wang, Qing
    Cui, Mingyuan
    Liang, Huizhi
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1468 - 1469
  • [15] MFIBlocks: An effective blocking algorithm for entity resolution
    Kenig, Batya
    Gal, Avigdor
    INFORMATION SYSTEMS, 2013, 38 (06) : 908 - 926
  • [16] Comparative Analysis of Approximate Blocking Techniques for Entity Resolution
    Papadakis, George
    Svirsky, Jonathan
    Gal, Avigdor
    Palpanas, Themis
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (09): : 684 - 695
  • [17] Improved suffix blocking for record linkage and entity resolution
    Allam, Amin
    Skiadopoulos, Spiros
    Kalnis, Panos
    DATA & KNOWLEDGE ENGINEERING, 2018, 117 : 98 - 113
  • [18] MultiBlock: A Scalable Iterative Approach for Progressive Entity Resolution
    Karapiperis, Dimitrios
    Gkoulalas-Divanis, Aris
    Verykios, Vassilios S.
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 219 - 228
  • [19] Meta-Blocking: Taking Entity Resolution to the Next Level
    Papadakis, George
    Koutrika, Georgia
    Palpanas, Themis
    Nejdl, Wolfgang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (08) : 1946 - 1960
  • [20] Incremental Blocking for Entity Resolution over Web Streaming Data
    Araujo, Tiago Brasileiro
    Stefanidis, Kostas
    Santos Pires, Carlos Eduardo
    Nummenmaa, Jyrki
    da Nobrega, Thiago Pereira
    2019 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2019), 2019, : 332 - 336