Optimal Transport-based Alignment of Learned Character Representations for String Similarity

被引:0
|
作者
Tam, Derek [1 ]
Monath, Nicholas [1 ]
Kobren, Ari [1 ]
Traylor, Aaron [2 ]
Das, Rajarshi [1 ]
McCallum, Andrew [1 ]
机构
[1] Univ Massachusetts, Coll Informat & Comp Sci, Amherst, MA 01003 USA
[2] Brown Univ, Dept Comp Sci, Providence, RI 02912 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE-a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. We evaluate STANCE's ability to detect whether two strings can refer to the same entity-a task we term alias detection. We construct five new alias detection datasets (and make them publicly available). We show that STANCE (or one of its variants) outperforms both state-of-the-art and classic, parameter-free similarity models on four of the five datasets. We also demonstrate STANCE's ability to improve downstream tasks by applying it to an instance of cross-document coreference and show that it leads to a 2.8 point improvement in B-3 F1 over the previous state-of-the-art approach.
引用
收藏
页码:5907 / 5917
页数:11
相关论文
共 50 条
  • [31] Optimal Transport-based Coverage Control for Swarm Robot Systems: Generalization of the Voronoi Tessellation-based Method
    Inoue, Daisuke
    Ito, Yuji
    Yoshida, Hiroaki
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 3032 - 3037
  • [32] Optimal transport-based unsupervised semantic disentanglement: A novel approach for efficient image editing in GANs
    Liu, Yunqi
    Ouyang, Xue
    Jiang, Tian
    Ding, Hongwei
    Cui, Xiaohui
    DISPLAYS, 2023, 80
  • [33] Promise and Limitations of Supervised Optimal Transport-Based Graph Summarization via Information Theoretic Measures
    Neshatfar, Sepideh
    Magner, Abram
    Sekeh, Salimeh Yasaei
    IEEE ACCESS, 2023, 11 : 87533 - 87542
  • [34] Optimal Transport-Based Coverage Control for Swarm Robot Systems: Generalization of the Voronoi Tessellation-Based Method
    Inoue, Daisuke
    Ito, Yuji
    Yoshida, Hiroaki
    IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (04): : 1483 - 1488
  • [35] Sample-prototype optimal transport-based universal domain adaptation for remote sensing image classification
    Chen, Xiaosong
    Yang, Yongbo
    Liu, Dong
    Wang, Shengsheng
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (01)
  • [36] Protocol to denoise spatially resolved transcriptomics data utilizing optimal transport-based gene filtering algorithm
    Du, Lin
    Kang, Jingmin
    Li, Jie
    Qin, Hua
    Hou, Yong
    Sun, Hai-Xi
    STAR PROTOCOLS, 2025, 6 (01):
  • [37] SpotGF: Denoising spatially resolved transcriptomics data using an optimal transport-based gene filtering algorithm
    Du, Lin
    Kang, Jingmin
    Hou, Yong
    Sun, Hai-Xi
    Zhang, Bohan
    CELL SYSTEMS, 2024, 15 (10)
  • [38] Multimodal Optimal Transport-based Co-Attention Transformer with Global Structure Consistency for Survival Prediction
    Xu, Yingxue
    Chen, Hao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21184 - 21194
  • [39] A novel optimal transport-based approach for interpolating spectral time series: Paving the way for photometric classification of supernovae
    Ramirez, Mauricio
    Pignata, Giuliano
    Förster, Francisco
    González-Gaitán, Santiago
    Gutiérrez, Claudia P.
    Ayala, Bastian
    Cabrera-Vives, Guillermo
    Catelan, Márcio
    Arancibia, Alejandra M. Muñoz
    Pineda-García, Jonathan
    Astronomy and Astrophysics, 2024, 691
  • [40] Optimal transport-based dictionary learning and its application to Euclid-like Point Spread Function representation
    Schmitz, Morgan A.
    Heitz, Matthieu
    Bonneel, Nicolas
    Ngole, Fred
    Coeurjolly, David
    Cuturi, Marco
    Peyre, Gabriel
    Starck, Jean-Luc
    WAVELETS AND SPARSITY XVII, 2017, 10394