We address the problem of approximate string matching in two dimensions, that is, to find a pattern of size m x m in a text of size n x n with at most k errors (substitutions, insertions and deletions). Although the problem can be solved using dynamic programming in time O(m(2)n(2)), this is in general too expensive for small L. So we design a filtering algorithm which avoids verifying most of the text with dynamic programming. This filter is based on a one-dimensional multi-pattern approximate search algorithm. The average complexity of our resulting algorithm is O(n(2)k log(sigma) m /m(2)) for k < m(m + 1)/(5 log(sigma) m), which is optimal and matches the best previous result which allows only substitutions. For higher error levels, we present an algorithm with time complexity O(n(2)k/(w root sigma) (where w is the size in bits of the computer word and sigma is the alphabet size). This algorithm works for k < m(m+1)(1-e/root sigma), where e = 2.718..., a limit which is not possible to improve. These are the first good expected-case algorithms for the problem. Our algorithms work also for rectangular patterns and rectangular text and can even be extended to the case where each row in the pattern and the text has a different length.
机构:
NIPPON TELEGRAPH & TEL PUBL CORP, MUSASHINO ELECT COMMUN LAB, BASIC RES LABS, MUSASHINO, TOKYO 180, JAPANNIPPON TELEGRAPH & TEL PUBL CORP, MUSASHINO ELECT COMMUN LAB, BASIC RES LABS, MUSASHINO, TOKYO 180, JAPAN
MORITA, K
NAKAZONO, K
论文数: 0引用数: 0
h-index: 0
机构:
NIPPON TELEGRAPH & TEL PUBL CORP, MUSASHINO ELECT COMMUN LAB, BASIC RES LABS, MUSASHINO, TOKYO 180, JAPANNIPPON TELEGRAPH & TEL PUBL CORP, MUSASHINO ELECT COMMUN LAB, BASIC RES LABS, MUSASHINO, TOKYO 180, JAPAN
NAKAZONO, K
SUGATA, K
论文数: 0引用数: 0
h-index: 0
机构:
NIPPON TELEGRAPH & TEL PUBL CORP, MUSASHINO ELECT COMMUN LAB, BASIC RES LABS, MUSASHINO, TOKYO 180, JAPANNIPPON TELEGRAPH & TEL PUBL CORP, MUSASHINO ELECT COMMUN LAB, BASIC RES LABS, MUSASHINO, TOKYO 180, JAPAN