A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings

被引:3
|
作者
Chen, Kuan-Yu [1 ]
Chao, Kun-Mao [1 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
关键词
Compressed pattern matching; Edit distance; Run length;
D O I
10.1007/s00453-011-9592-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A recent trend in stringology explores the possibility of utilizing text compression to speed up similarity computation between strings. In this line of investigation, run-length encoding is one of the earliest studied compression schemes. Despite its simple coding nature, the only positive result before this work is the computation of the in-del distance (dual of longest common subsequence), which requires O(mnlogmn) time, where m and n denote the number of runs of the input strings. The worst-case time complexity of computing the edit distance between two run-length encoded strings still depends on the uncompressed string lengths. In this paper, we break the foundational gap by providing its first "fully compressed" algorithm whose running time depends solely on the compressed string lengths. Specifically, given two strings, compressed into m and n runs, ma parts per thousand currency signn, we present an O(mn (2))-time algorithm for computing the edit distance of the strings. Our approach also yields the first fully compressed solution to approximate matching of a pattern of m runs in a text of n runs in O(mn (2)) time.
引用
收藏
页码:354 / 370
页数:17
相关论文
共 50 条
  • [41] Efficient retrieval of approximate palindromes in a run-length encoded string
    Chen, Kuan-Yu
    Hsu, Ping-Hui
    Chao, Kun-Mao
    THEORETICAL COMPUTER SCIENCE, 2012, 432 : 28 - 37
  • [42] Inplace run-length 2d compressed search
    Amir, A
    Landau, GM
    Sokol, D
    PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2000, : 817 - 818
  • [43] Template-based rendering of run-length encoded volumes
    Lee, CH
    Koo, YM
    Shin, YG
    FIFTH PACIFIC CONFERENCE ON COMPUTER GRAPHICS AND APPLICATIONS, PROCEEDINGS, 1997, : 138 - 147
  • [44] Renyi entropy and pattern matching for run-length encoded sequences
    Rousseau, Jerome
    ALEA-LATIN AMERICAN JOURNAL OF PROBABILITY AND MATHEMATICAL STATISTICS, 2021, 18 (01): : 887 - 905
  • [45] Entropy Computations of Document Images in Run-Length Compressed Domain
    Nagabhushan, P.
    Javed, Mohammed
    Chaudhuri, B. B.
    2014 FIFTH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2014), 2014, : 287 - 291
  • [46] Inplace run-length 2d compressed search
    Amir, A
    Landau, GM
    Sokol, D
    THEORETICAL COMPUTER SCIENCE, 2003, 290 (03) : 1361 - 1383
  • [47] FAST STRING-MATCHING ALGORITHMS FOR RUN-LENGTH CODED STRINGS
    CHUNG, KL
    COMPUTING, 1995, 54 (02) : 119 - 125
  • [48] Partitionable bus-based string-matching algorithm for run-length coded strings with VLDCs
    Chen, HN
    Chung, KL
    VLSI DESIGN, 1999, 9 (01) : 55 - 67
  • [49] Arithmetic and Boolean Operations on Recursively Run-Length Compressed Natural Numbers
    Tarau, Paul
    SCIENTIFIC ANNALS OF COMPUTER SCIENCE, 2014, 24 (02) : 287 - 323
  • [50] LZ77 Computation Based on the Run-Length Encoded BWT
    Policriti, Alberto
    Prezza, Nicola
    ALGORITHMICA, 2018, 80 (07) : 1986 - 2011