Incremental Blocking for Entity Resolution over Web Streaming Data

被引:4
|
作者
Araujo, Tiago Brasileiro [1 ,2 ]
Stefanidis, Kostas [1 ]
Santos Pires, Carlos Eduardo [2 ]
Nummenmaa, Jyrki [1 ]
da Nobrega, Thiago Pereira [3 ]
机构
[1] Tampere Univ, Tampere, Finland
[2] Univ Fed Campina Grande, Campina Grande, Paraiba, Brazil
[3] State Univ Paraiba, Campina Grande, Paraiba, Brazil
关键词
entity resolution; heterogeneous data; incremental processing;
D O I
10.1145/3350546.3352542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The widespread use of information systems has become a valuable source of semi-structured data. In this context, Entity Resolution (ER) emerges as a fundamental task to integrate multiple knowledge bases or identify similarities between data items (i.e., entities). Since ER is an inherently quadratic task, blocking techniques are often used to improve efficiency. Beyond the challenges related to the data volume and heterogeneity, blocking techniques also face two other challenges: streaming data and incremental processing. To address these challenges, we propose PRIME, a novel incremental schema-agnostic blocking technique that utilizes parallelism to enhance blocking efficiency. The proposed technique deals with streaming and incremental data using a distributed computational infrastructure. To improve efficiency, the technique avoids unnecessary comparisons and applies a time window strategy to prevent excessive memory consumption.
引用
收藏
页码:332 / 336
页数:5
相关论文
共 50 条
  • [1] Incremental Entity Blocking over Heterogeneous Streaming Data
    Araujo, Tiago Brasileiro
    Stefanidis, Kostas
    Santos Pires, Carlos Eduardo
    Nummenmaa, Jyrki
    da Nobrega, Thiago Pereira
    INFORMATION, 2022, 13 (12)
  • [2] TREATS: Fairness-aware entity resolution over streaming data
    Araujo, Tiago Brasileiro
    Efthymiou, Vasilis
    Christophides, Vassilis
    Pitoura, Evaggelia
    Stefanidis, Kostas
    INFORMATION SYSTEMS, 2025, 129
  • [3] A Blocking Scheme for Entity Resolution in the Semantic Web
    Costa, Gustavo de Assis
    Parente de Oliveira, Jose Maria
    IEEE 30TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS IEEE AINA 2016, 2016, : 1138 - 1145
  • [4] Incremental entity resolution on rules and data
    Whang, Steven Euijong
    Garcia-Molina, Hector
    VLDB JOURNAL, 2014, 23 (01): : 77 - 102
  • [5] Incremental entity resolution on rules and data
    Steven Euijong Whang
    Hector Garcia-Molina
    The VLDB Journal, 2014, 23 : 77 - 102
  • [6] Entity resolution framework using rough set blocking for heterogeneous web of data
    Vidhya, K. A.
    Geetha, T. V.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (01) : 659 - 675
  • [7] Entity Resolution in the Web of Data
    Stefanidis, Kostas
    Efthymiou, Vasilis
    Herschel, Melanie
    Christophides, Vassilis
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 203 - 203
  • [8] Entity Resolution in the Web of Data
    Department of Computer Science, University of Crete, Greece
    不详
    不详
    Synth. lect. semant. web : theory technol., 3 (1-124):
  • [9] Incremental entity resolution process over query results for data integration systems
    Priscilla Kelly Machado Vieira
    Bernadette Farias Lóscio
    Ana Carolina Salgado
    Journal of Intelligent Information Systems, 2019, 52 : 451 - 471
  • [10] Incremental entity resolution process over query results for data integration systems
    Machado Vieira, Priscilla Kelly
    Loscio, Bernadette Farias
    Salgado, Ana Carolina
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2019, 52 (02) : 451 - 471