Incremental Blocking for Entity Resolution over Web Streaming Data

被引:4
|
作者
Araujo, Tiago Brasileiro [1 ,2 ]
Stefanidis, Kostas [1 ]
Santos Pires, Carlos Eduardo [2 ]
Nummenmaa, Jyrki [1 ]
da Nobrega, Thiago Pereira [3 ]
机构
[1] Tampere Univ, Tampere, Finland
[2] Univ Fed Campina Grande, Campina Grande, Paraiba, Brazil
[3] State Univ Paraiba, Campina Grande, Paraiba, Brazil
关键词
entity resolution; heterogeneous data; incremental processing;
D O I
10.1145/3350546.3352542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The widespread use of information systems has become a valuable source of semi-structured data. In this context, Entity Resolution (ER) emerges as a fundamental task to integrate multiple knowledge bases or identify similarities between data items (i.e., entities). Since ER is an inherently quadratic task, blocking techniques are often used to improve efficiency. Beyond the challenges related to the data volume and heterogeneity, blocking techniques also face two other challenges: streaming data and incremental processing. To address these challenges, we propose PRIME, a novel incremental schema-agnostic blocking technique that utilizes parallelism to enhance blocking efficiency. The proposed technique deals with streaming and incremental data using a distributed computational infrastructure. To improve efficiency, the technique avoids unnecessary comparisons and applies a time window strategy to prevent excessive memory consumption.
引用
收藏
页码:332 / 336
页数:5
相关论文
共 50 条
  • [21] Entity Resolution with Iterative Blocking
    Whang, Steven Euijong
    Menestrina, David
    Koutrika, Georgia
    Theobald, Martin
    Garcia-Molina, Hector
    ACM SIGMOD/PODS 2009 CONFERENCE, 2009, : 219 - 231
  • [22] A multiclass classification approach for incremental entity resolution on short textual data
    Silva, João Antonio
    Pereira, Denilson Alves
    International Journal of Business Intelligence and Data Mining, 2021, 18 (02) : 218 - 245
  • [23] A Framework for Entity Resolution with Efficient Blocking
    Shu, Liangcai
    Lin, Can
    Meng, Weiyi
    Han, Yue
    Yu, Clement T.
    Smalheiser, Neil R.
    2012 IEEE 13TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2012, : 431 - 440
  • [24] BEER: Blocking for Effective Entity Resolution
    Galhotra, Sainyam
    Firmani, Donatella
    Saha, Barna
    Srivastava, Divesh
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2711 - 2715
  • [25] Big Data Entity Resolution: From Highly to Somehow Similar Entity Descriptions in the Web
    Efthymiou, Vasilis
    Stefanidis, Kostas
    Christophides, Vassilis
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 401 - 410
  • [26] A Survey on Blocking Technology of Entity Resolution
    Li, Bo-Han
    Liu, Yi
    Zhang, An-Man
    Wang, Wen-Huan
    Wan, Shuo
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2020, 35 (04) : 769 - 793
  • [27] A Survey on Blocking Technology of Entity Resolution
    Bo-Han Li
    Yi Liu
    An-Man Zhang
    Wen-Huan Wang
    Shuo Wan
    Journal of Computer Science and Technology, 2020, 35 : 769 - 793
  • [28] DISC: Density-Based Incremental Clustering by Striding over Streaming Data
    Kim, Bogyeong
    Koo, Kyoseung
    Kim, Juhun
    Moon, Bongki
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 828 - 839
  • [29] Active Blocking Scheme Learning for Entity Resolution
    Shao, Jingyu
    Wang, Qing
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 350 - 362
  • [30] Semantic-Aware Blocking for Entity Resolution
    Wang, Qing
    Cui, Mingyuan
    Liang, Huizhi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) : 166 - 180