Geospatial Entity Resolution

被引:4
|
作者
Balsebre, Pasquale [1 ]
Yao, Dezhong [2 ]
Cong, Gao [1 ]
Hai, Zhen [3 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[3] Alibaba Grp, DAMO Acad, Singapore, Singapore
基金
中国国家自然科学基金;
关键词
Entity resolution; neural networks; geospatial data; neighbourhood embedding; graph attention;
D O I
10.1145/3485447.3512026
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A geospatial database is today at the core of an ever increasing number of services. Building and maintaining it remains challenging due to the need to merge information from multiple providers. Entity Resolution (ER) consists of finding entity mentions from different sources that refer to the same real world entity. In geospatial ER, entities are often represented using different schemes and are subject to incomplete information and inaccurate location, making ER and deduplication daunting tasks. While tremendous advances have been made in traditional entity resolution and natural language processing, geospatial data integration approaches still heavily rely on static similarity measures and human-designed rules. In order to achieve automatic linking of geospatial data, a unified representation of entities with heterogeneous attributes and their geographical context, is needed. To this end, we propose Geo-ER1, a joint framework that combines Transformer-based language models, that have been successfully applied in ER, with a novel learning-based architecture to represent the geospatial character of the entity. Different from existing solutions, Geo-ER does not rely on pre-defined rules and is able to capture information from surrounding entities in order to make context-based, accurate predictions. Extensive experiments on eight real world datasets demonstrate the effectiveness of our solution over state-of-the-art methods. Moreover, Geo-ER proves to be robust in settings where there is no available training data for a specific city.
引用
收藏
页码:3061 / 3070
页数:10
相关论文
共 50 条
  • [21] Entity Resolution with Iterative Blocking
    Whang, Steven Euijong
    Menestrina, David
    Koutrika, Georgia
    Theobald, Martin
    Garcia-Molina, Hector
    ACM SIGMOD/PODS 2009 CONFERENCE, 2009, : 219 - 231
  • [22] ENTITY RESOLUTION AND BLOCKING: A REVIEW
    Vidhya, K. A.
    Geetha, T. V.
    PROCEEDINGS OF THE 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (IACC 2019), 2019, : 133 - 140
  • [23] Entity Resolution On-Demand
    Simonini, Giovanni
    Zecchini, Luca
    Bergamaschi, Sonia
    Naumann, Felix
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (07): : 1506 - 1518
  • [24] Disinformation Techniques for Entity Resolution
    Whang, Steven Euijong
    Garcia-Molina, Hector
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 715 - 720
  • [25] Entity Resolution in the Web of Data
    Stefanidis, Kostas
    Efthymiou, Vasilis
    Herschel, Melanie
    Christophides, Vassilis
    WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 203 - 203
  • [26] Entity Resolution in the Web of Data
    Department of Computer Science, University of Crete, Greece
    不详
    不详
    Synth. lect. semant. web : theory technol., 3 (1-124):
  • [27] Entity Resolution in Dissimilarity Spaces
    Verykios, Vassilios S.
    Karapiperis, Dimitrios
    25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021), 2021, : 413 - 418
  • [28] Parallel Entity Resolution with Dedoop
    Lars Kolb
    Erhard Rahm
    Kolb, Lars (kolb@informatik.uni-leipzig.de), 1600, Springer Medizin (13): : 23 - 32
  • [29] EXPLAINER: Entity Resolution Explanations
    Ebaid, Amr
    Thirumuruganathan, Saravanan
    Aref, Walid G.
    Elmagarmid, Ahmed
    Ouzzani, Mourad
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2000 - 2003
  • [30] Entity resolution: Overview and challenges
    Garcia-Molina, H
    CONCEPTUAL MODELING - ER 2004, PROCEEDINGS, 2004, 3288 : 1 - 2