Improved algorithms for approximate string matching (extended abstract)

被引:2
|
作者
Papamichail, Dimitris [1 ]
Papamichail, Georgios [2 ]
机构
[1] Univ Miami, Dept Comp Sci, Miami, FL USA
[2] Natl Ctr Publ Adm, Athens, Greece
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
DISTANCES;
D O I
10.1186/1471-2105-10-S1-S10
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share. Results: We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s-vertical bar n-m vertical bar).min(m,n,s) + m + n) and linear space, where s is the edit distance between the two strings. This worst-case time bound sets the quadratic factor of the algorithm independent of the longest string length and improves existing theoretical bounds for this problem. The implementation of our algorithm also excels in practice, especially in cases where the two strings compared differ significantly in length. Conclusion: We have provided the design, analysis and implementation of a new algorithm for calculating the edit distance of two strings with both theoretical and practical implications. Source code of our algorithm is available online.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] APPROXIMATE STRING MATCHING
    HALL, PAV
    DOWLING, GR
    COMPUTING SURVEYS, 1980, 12 (04) : 381 - 402
  • [22] Parallel maximum matching algorithms in interval graphs (extended abstract)
    Chung, YJ
    Park, K
    Cho, YK
    1997 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 1997, : 602 - 609
  • [23] Optimizing the cost matrix for approximate string matching using genetic algorithms
    Universite Laval, Ste-Foy, Canada
    Pattern Recognit, 4 (431-440):
  • [24] Study of Bit-Parallel Approximate Parameterized String Matching Algorithms
    Prasad, Rajesh
    Agarwal, Suneeta
    CONTEMPORARY COMPUTING, PROCEEDINGS, 2009, 40 : 26 - 36
  • [25] Families of FPGA-based algorithms tor approximate string matching
    Van Court, T
    Herbordt, MC
    15TH IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, PROCEEDINGS, 2004, : 354 - +
  • [26] AN APPROACH TO DESIGNING VERY FAST APPROXIMATE STRING-MATCHING ALGORITHMS
    DU, MW
    CHANG, SC
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1994, 6 (04) : 620 - 633
  • [27] A programmable array processor architecture for flexible approximate string matching algorithms
    Michailidis, Panagiotis D.
    Margaritis, Konstantinos G.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2007, 67 (02) : 131 - 141
  • [28] A programmable array processor architecture for flexible approximate string matching algorithms
    Michailidis, PD
    Margaritis, KG
    2005 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, PROCEEDINGS, 2005, : 201 - 209
  • [29] Optimizing the cost matrix for approximate string matching using genetic algorithms
    Parizeau, M
    Ghazzali, N
    Hebert, JF
    PATTERN RECOGNITION, 1998, 31 (04) : 431 - 440
  • [30] A Preprocessing for Approximate String Matching
    Baba, Kensuke
    Nakatoh, Tetsuya
    Yamada, Yasuhiro
    Ikeda, Daisuke
    INFORMATICS ENGINEERING AND INFORMATION SCIENCE, PT II, 2011, 252 : 610 - +