Robust and Scalable Entity Alignment in Big Data

被引:2
|
作者
Flamino, James [1 ]
Abriola, Christopher [2 ]
Zimmerman, Benjamin [2 ]
Li, Zhongheng [2 ]
Douglas, Joel [2 ]
机构
[1] Rensselaer Polytech Inst, Dept Phys Appl Phys & Astrophy, Troy, NY 12180 USA
[2] Syst & Technol Res, Woburn, MA USA
关键词
Graph alignment; clustering; MapReduce; CLUSTERING-ALGORITHM;
D O I
10.1109/BigData50022.2020.9378273
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entity alignment has always had significant uses within a multitude of diverse scientific fields. In particular, the concept of matching entities across networks has grown in significance in the world of social science as communicative networks such as social media have expanded in scale and popularity. With the advent of big data, there is a growing need to provide analysis on graphs of massive scale. However, with millions of nodes and billions of edges, the idea of alignment between a myriad of graphs of similar scale using features extracted from potentially sparse or incomplete datasets becomes daunting. In this paper we will propose a solution to the issue of large-scale alignments in the form of a multi-step pipeline. Within this pipeline we introduce scalable feature extraction for robust temporal attributes, accompanied by novel and efficient clustering algorithms in order to find groupings of similar nodes across graphs. The features and their clusters are fed into a versatile alignment stage that accurately identifies partner nodes among millions of possible matches. Our results show that the pipeline can process large data sets, achieving efficient runtimes within the memory constraints.
引用
收藏
页码:2526 / 2533
页数:8
相关论文
共 50 条
  • [31] Revisiting Embedding-Based Entity Alignment: A Robust and Adaptive Method
    Sun, Zequn
    Hu, Wei
    Wang, Chengming
    Wang, Yuxin
    Qu, Yuzhong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (08) : 8461 - 8475
  • [32] AlignStatPlot: An R package and online tool for robust sequence alignment statistics and innovative visualization of big data
    Alsamman, Alsamman M.
    El Allali, Achraf
    Mokhtar, Morad M.
    Al-Sham'aa, Khaled
    Nassar, Ahmed E.
    Mousa, Khaled H.
    Kehel, Zakaria
    PLOS ONE, 2023, 18 (09):
  • [33] AN EFFECTIVE AND SCALABLE DATA MODELING FOR ENTERPRISE BIG DATA PLATFORM
    Patel, Jayesh
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2691 - 2697
  • [34] Design of a Scalable Data Stream Channel for Big Data Processing
    Lee, Yong-Ju
    Lee, Myungcheol
    Lee, Mi-Young
    Hur, Sung Jin
    Min, Okgee
    2015 17TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2015, : 556 - 559
  • [35] A NoSQL Data Model For Scalable Big Data Workflow Execution
    Mohan, Aravind
    Ebrahimi, Mahdi
    Lu, Shiyong
    Kotov, Alexander
    2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 52 - 59
  • [36] Introduction to Big Data: Scalable Representation and Analytics for Data Science
    Kaisler, Steve
    Armour, Frank
    Espinosa, Albert
    PROCEEDINGS OF THE 46TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 2013, : 984 - 984
  • [37] ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities
    Gao, Yunjun
    Liu, Xiaoze
    Wu, Junyang
    Li, Tianyi
    Wang, Pengfei
    Chen, Lu
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 421 - 431
  • [38] Entity deduplication in big data graphs for scholarly communication
    Manghi, Paolo
    Atzori, Claudio
    De Bonis, Michele
    Bardi, Alessia
    DATA TECHNOLOGIES AND APPLICATIONS, 2020, 54 (04) : 409 - 435
  • [39] Populating Entity Name Systems for Big Data Integration
    Kejriwal, Mayank
    SEMANTIC WEB - ISWC 2014, PT II, 2014, 8797 : 521 - 528
  • [40] Ontology Matching Algorithms for Data Model Alignment in Big Data
    Frimpong, Ruth Achiaa
    SEMANTIC WEB, ESWC 2017, PT II, 2017, 10250 : 195 - 204