Improved algorithms for approximate string matching (extended abstract)

被引:2
|
作者
Papamichail, Dimitris [1 ]
Papamichail, Georgios [2 ]
机构
[1] Univ Miami, Dept Comp Sci, Miami, FL USA
[2] Natl Ctr Publ Adm, Athens, Greece
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
DISTANCES;
D O I
10.1186/1471-2105-10-S1-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share. Results: We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s-vertical bar n-m vertical bar).min(m,n,s) + m + n) and linear space, where s is the edit distance between the two strings. This worst-case time bound sets the quadratic factor of the algorithm independent of the longest string length and improves existing theoretical bounds for this problem. The implementation of our algorithm also excels in practice, especially in cases where the two strings compared differ significantly in length. Conclusion: We have provided the design, analysis and implementation of a new algorithm for calculating the edit distance of two strings with both theoretical and practical implications. Source code of our algorithm is available online.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Improved pattern-scan-order algorithms for string matching
    Ryu, Cheol
    Park, Kunsoo
    JOURNAL OF DISCRETE ALGORITHMS, 2018, 49 : 27 - 36
  • [42] Approximate string matching algorithms for limited-vocabulary OCR output correction
    Lasko, TA
    Hauser, SE
    DOCUMENT RECOGNITION AND RETRIEVAL VIII, 2001, 4307 : 232 - 240
  • [43] Improved approximation algorithms for unsplittable flow problems (extended abstract)
    Kolliopoulos, SG
    Stein, C
    38TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 1997, : 426 - 435
  • [44] New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance (vol 74, pg 1815, 2018)
    Ho, ThienLuan
    Oh, Seung-Rohk
    Kim, HyunJin
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (05): : 1835 - 1835
  • [45] Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
    Bille, Philip
    Fagerberg, Rolf
    Gortz, Inge Li
    ACM TRANSACTIONS ON ALGORITHMS, 2009, 6 (01)
  • [46] Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts
    Bille, Philip
    Fagerberg, Rolf
    Gortz, Inge Li
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2007, 4580 : 52 - +
  • [47] A very fast string matching algorithm for small alphabets and long patterns (Extended abstract)
    Charras, C
    Lecroq, T
    Pehoushek, JD
    COMBINATORIAL PATTERN MATCHING, 1998, 1448 : 55 - 64
  • [48] APPROXIMATE STRING MATCHING - INVESTIGATIONS WITH A HARDWARE STRING COMPARATOR
    OWOLABI, O
    FERGUSON, JD
    LECTURE NOTES IN COMPUTER SCIENCE, 1988, 301 : 536 - 545
  • [49] Approximate string matching with swap and mismatch
    Lipsky, Ohad
    Porat, Benny
    Porat, Elly
    Shalom, B. Riva
    Tzur, Asaf
    ALGORITHMS AND COMPUTATION, 2007, 4835 : 869 - +
  • [50] A Consensus Algorithm for Approximate String Matching
    Rubio, Miguel
    Alba, Alfonso
    Mendez, Martin
    Arce-Santana, Edgar
    Rodriguez-Kessler, Margarita
    3RD IBEROAMERICAN CONFERENCE ON ELECTRONICS ENGINEERING AND COMPUTER SCIENCE, CIIECC 2013, 2013, 7 : 322 - 327