Robust and Scalable Entity Alignment in Big Data

被引:2
|
作者
Flamino, James [1 ]
Abriola, Christopher [2 ]
Zimmerman, Benjamin [2 ]
Li, Zhongheng [2 ]
Douglas, Joel [2 ]
机构
[1] Rensselaer Polytech Inst, Dept Phys Appl Phys & Astrophy, Troy, NY 12180 USA
[2] Syst & Technol Res, Woburn, MA USA
关键词
Graph alignment; clustering; MapReduce; CLUSTERING-ALGORITHM;
D O I
10.1109/BigData50022.2020.9378273
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity alignment has always had significant uses within a multitude of diverse scientific fields. In particular, the concept of matching entities across networks has grown in significance in the world of social science as communicative networks such as social media have expanded in scale and popularity. With the advent of big data, there is a growing need to provide analysis on graphs of massive scale. However, with millions of nodes and billions of edges, the idea of alignment between a myriad of graphs of similar scale using features extracted from potentially sparse or incomplete datasets becomes daunting. In this paper we will propose a solution to the issue of large-scale alignments in the form of a multi-step pipeline. Within this pipeline we introduce scalable feature extraction for robust temporal attributes, accompanied by novel and efficient clustering algorithms in order to find groupings of similar nodes across graphs. The features and their clusters are fed into a versatile alignment stage that accurately identifies partner nodes among millions of possible matches. Our results show that the pipeline can process large data sets, achieving efficient runtimes within the memory constraints.
引用
收藏
页码:2526 / 2533
页数:8
相关论文
共 50 条
  • [1] SEA: A Scalable Entity Alignment System
    Wu, Junyang
    Li, Tianyi
    Chen, Lu
    Gao, Yunjun
    Wei, Ziheng
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3175 - 3179
  • [2] Robust and Scalable Column/Row Sampling from Corrupted Big Data
    Rahmani, Mostafa
    Atia, George
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 1818 - 1826
  • [3] Entity Resolution for Big Data
    Getoor, Lise
    Machanavajjhala, Ashwin
    19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 1525 - 1525
  • [4] On Scalable and Robust Truth Discovery in Big Data Social Media Sensing Applications
    Zhang, Daniel
    Wang, Dong
    Vance, Nathan
    Zhang, Yang
    Mike, Steven
    IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (02) : 195 - 208
  • [5] Scalable data summarization on big data
    Feifei Li
    Suman Nath
    Distributed and Parallel Databases, 2014, 32 : 313 - 314
  • [6] Scalable data summarization on big data
    Li, Feifei
    Nath, Suman
    DISTRIBUTED AND PARALLEL DATABASES, 2014, 32 (03) : 313 - 314
  • [7] Overlapped Hashing: A Novel Scalable Blocking Technique for Entity Resolution in Big-Data Era
    Khalil, Rana
    Shawish, Ahmed
    Elzanfaly, Doaa
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 427 - 441
  • [8] Scalable Mining of Big Data
    Leung, Carson K.
    Pazdor, Adam G. M.
    Zheng, Hao
    2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 240 - 247
  • [9] Entity Resolution in a Big Data Framework
    Kejriwal, Mayank
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 4243 - 4244
  • [10] SEDEX: Scalable Entity Preserving Data Exchange
    Sekhavat, Yoones A.
    Parsons, Jeffrey
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (07) : 1878 - 1890