Extending the Bag Distance for String Similarity Search

被引:0
|
作者
Mergen S. [1 ]
机构
[1] Departamento de Linguagens e Sistemas de Computação, Universidade Federal de Santa Maria, Avenida Roraima, Rio Grande do Sul, Santa Maria
关键词
Bag Distance; Edit Distance; Metric spaces; String similarity;
D O I
10.1007/s42979-022-01502-5
中图分类号
学科分类号
摘要
In the context of string similarity search, the Edit Distance is the preferred choice for indexes based on a metric space. However, the high distances between strings lead to indexes with low pruning factors. Besides, computing the distances is time consuming. An alternative is the Bag Distance, whose computational cost is lower. In this paper, we propose an extension of the Bag Distance (The Anagram Distance) that allows non-uniform costs. The extension is more compatible to the Edit Distance and its applications. We also transform the index space into one that uses an Anagram Distance as the metric function, leaving the Edit Distance computation to a validation phase. As we describe, the transformation increases the pruning factor of in-memory indexes, specially when the costs are non-uniform. Experiments report the improvements achieved during search, both in terms of execution time and the number of distance computations. © 2022, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [21] THE 'STRING BAG'
    GRASS, G
    PARIS REVIEW, 1992, (124): : 146 - 154
  • [22] Fast Similarity Search for Graphs by Edit Distance
    Rachkovskij, D. A.
    CYBERNETICS AND SYSTEMS ANALYSIS, 2019, 55 (06) : 1039 - 1051
  • [23] Fast Similarity Search for Graphs by Edit Distance
    D. A. Rachkovskij
    Cybernetics and Systems Analysis, 2019, 55 : 1039 - 1051
  • [24] Compressed String Dictionary Search with Edit Distance One
    Djamal Belazzougui
    Rossano Venturini
    Algorithmica, 2016, 74 : 1099 - 1122
  • [25] Compressed String Dictionary Search with Edit Distance One
    Belazzougui, Djamal
    Venturini, Rossano
    ALGORITHMICA, 2016, 74 (03) : 1099 - 1122
  • [26] Fast structural similarity search based on topology string matching
    Park, Sung-Hee
    Gilbert, David
    Ryu, Keun Ho
    PROCEEDINGS OF THE 5TH ASIA- PACIFIC BIOINFOMATICS CONFERENCE 2007, 2007, 5 : 341 - +
  • [27] A Pivotal Prefix Based Filtering Algorithm for String Similarity Search
    Deng, Dong
    Li, Guoliang
    Feng, Jianhua
    SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 673 - 684
  • [28] EFFICIENT BAG-OF-FEATURE KERNEL REPRESENTATION FOR IMAGE SIMILARITY SEARCH
    Precioso, F.
    Cord, M.
    Gorisse, D.
    Thome, N.
    2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011, : 109 - 112
  • [29] Similarity Search by Generating Pivots Based on Manhattan Distance
    Kobayashi, Eri
    Fushimi, Takayasu
    Saito, Kazumi
    Ikeda, Tetsuo
    PRICAI 2014: TRENDS IN ARTIFICIAL INTELLIGENCE, 2014, 8862 : 435 - 446
  • [30] Similarity search by generating pivots based on manhattan distance
    Graduate School of Management and Information of Innovation, University of Shizuoka, 52-1 Yada, Suruga-ku
    Shizuoka
    422-8526, Japan
    Lect. Notes Comput. Sci., (435-446):