A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings

被引:3
|
作者
Chen, Kuan-Yu [1 ]
Chao, Kun-Mao [1 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
关键词
Compressed pattern matching; Edit distance; Run length;
D O I
10.1007/s00453-011-9592-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A recent trend in stringology explores the possibility of utilizing text compression to speed up similarity computation between strings. In this line of investigation, run-length encoding is one of the earliest studied compression schemes. Despite its simple coding nature, the only positive result before this work is the computation of the in-del distance (dual of longest common subsequence), which requires O(mnlogmn) time, where m and n denote the number of runs of the input strings. The worst-case time complexity of computing the edit distance between two run-length encoded strings still depends on the uncompressed string lengths. In this paper, we break the foundational gap by providing its first "fully compressed" algorithm whose running time depends solely on the compressed string lengths. Specifically, given two strings, compressed into m and n runs, ma parts per thousand currency signn, we present an O(mn (2))-time algorithm for computing the edit distance of the strings. Our approach also yields the first fully compressed solution to approximate matching of a pattern of m runs in a text of n runs in O(mn (2)) time.
引用
收藏
页码:354 / 370
页数:17
相关论文
共 50 条
  • [21] Faster Algorithm for Computing the Edit Distance between SLP-Compressed Strings
    Gawrychowski, Pawel
    STRING PROCESSING AND INFORMATION RETRIEVAL: 19TH INTERNATIONAL SYMPOSIUM, SPIRE 2012, 2012, 7608 : 229 - 236
  • [22] AN ALGORITHM FOR MATCHING RUN-LENGTH CODED STRINGS
    BUNKE, H
    CSIRIK, J
    COMPUTING, 1993, 50 (04) : 297 - 314
  • [23] Shortest Unique Palindromic Substring Queries on Run-Length Encoded Strings
    Watanabe, Kiichi
    Nakashima, Yuto
    Inenaga, Shunsuke
    Bannai, Hideo
    Takeda, Masayuki
    COMBINATORIAL ALGORITHMS, IWOCA 2019, 2019, 11638 : 430 - 441
  • [24] Lyndon Factorization Algorithms for Small Alphabets and Run-Length Encoded Strings
    Ghuman, Sukhpal Singh
    Giaquinta, Emanuele
    Tarhio, Jorma
    ALGORITHMS, 2019, 12 (06)
  • [25] Edit distance for a run-length-encoded string and an uncompressed string
    Liu, J. J.
    Huang, G. S.
    Wang, Y. L.
    Lee, R. C. T.
    INFORMATION PROCESSING LETTERS, 2007, 105 (01) : 12 - 16
  • [26] Approximate Matching for Run-Length Encoded Strings Is 3SUM-Hard
    Chen, Kuan-Yu
    Hsu, Ping-Hui
    Chao, Kun-Mao
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2009, 5577 : 168 - 179
  • [27] An algorithm for the rapid computation of boundaries of run-length encoded regions
    Quek, FKH
    PATTERN RECOGNITION, 2000, 33 (10) : 1637 - 1649
  • [28] Fast Algorithms for the Shortest Unique Palindromic Substring Problem on Run-Length Encoded Strings
    Kiichi Watanabe
    Yuto Nakashima
    Shunsuke Inenaga
    Hideo Bannai
    Masayuki Takeda
    Theory of Computing Systems, 2020, 64 : 1273 - 1291
  • [29] Fast Algorithms for the Shortest Unique Palindromic Substring Problem on Run-Length Encoded Strings
    Watanabe, Kiichi
    Nakashima, Yuto
    Inenaga, Shunsuke
    Bannai, Hideo
    Takeda, Masayuki
    THEORY OF COMPUTING SYSTEMS, 2020, 64 (07) : 1273 - 1291
  • [30] Algorithms for Jumbled Indexing, Jumbled Border and Jumbled Square on Run-Length Encoded Strings
    Amir, Amihood
    Apostolico, Alberto
    Hirst, Tirza
    Landau, Gad M.
    Lewenstein, Noa
    Rozenberg, Liat
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2014, 2014, 8799 : 45 - 51