A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings

被引:3
|
作者
Chen, Kuan-Yu [1 ]
Chao, Kun-Mao [1 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
关键词
Compressed pattern matching; Edit distance; Run length;
D O I
10.1007/s00453-011-9592-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A recent trend in stringology explores the possibility of utilizing text compression to speed up similarity computation between strings. In this line of investigation, run-length encoding is one of the earliest studied compression schemes. Despite its simple coding nature, the only positive result before this work is the computation of the in-del distance (dual of longest common subsequence), which requires O(mnlogmn) time, where m and n denote the number of runs of the input strings. The worst-case time complexity of computing the edit distance between two run-length encoded strings still depends on the uncompressed string lengths. In this paper, we break the foundational gap by providing its first "fully compressed" algorithm whose running time depends solely on the compressed string lengths. Specifically, given two strings, compressed into m and n runs, ma parts per thousand currency signn, we present an O(mn (2))-time algorithm for computing the edit distance of the strings. Our approach also yields the first fully compressed solution to approximate matching of a pattern of m runs in a text of n runs in O(mn (2)) time.
引用
收藏
页码:354 / 370
页数:17
相关论文
共 50 条
  • [1] A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings
    Chen, Kuan-Yu
    Chao, Kun-Mao
    ALGORITHMS-ESA 2010, 2010, 6346 : 415 - 426
  • [2] A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings
    Kuan-Yu Chen
    Kun-Mao Chao
    Algorithmica, 2013, 65 : 354 - 370
  • [3] Edit distance of run-length encoded strings
    Arbell, O
    Landau, GM
    Mitchell, JSB
    INFORMATION PROCESSING LETTERS, 2002, 83 (06) : 307 - 314
  • [4] AN IMPROVED ALGORITHM FOR COMPUTING THE EDIT DISTANCE OF RUN-LENGTH CODED STRINGS
    BUNKE, H
    CSIRIK, J
    INFORMATION PROCESSING LETTERS, 1995, 54 (02) : 93 - 96
  • [5] Matching for run-length encoded strings
    Apostolico, A
    Landau, GM
    Skiena, S
    JOURNAL OF COMPLEXITY, 1999, 15 (01) : 4 - 16
  • [6] A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings
    Ann, Hsing-Yen
    Yang, Chang-Biau
    Tseng, Chiou-Ting
    Hor, Chiou-Yi
    INFORMATION PROCESSING LETTERS, 2008, 108 (06) : 360 - 364
  • [7] Matching for run-length encoded strings
    Apostolico, A
    Landau, GM
    Skiena, S
    COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, : 348 - 356
  • [8] Computing the Longest Common Subsequence of Two Run-Length Encoded Strings
    Sakai, Yoshifumi
    ALGORITHMS AND COMPUTATION, ISAAC 2012, 2012, 7676 : 197 - 206
  • [9] Computing similarity of run-length encoded strings with affine gap penalty
    Kim, Jin Wook
    Amir, Amihood
    Landau, Gad M.
    Park, Kunsoo
    String Processing and Information Retrieval, Proceedings, 2005, 3772 : 315 - 326
  • [10] Fast algorithms for computing the constrained LCS of run-length encoded strings
    Ann, Hsing-Yen
    Yang, Chang-Biau
    Tseng, Chiou-Ting
    Hor, Chiou-Yi
    THEORETICAL COMPUTER SCIENCE, 2012, 432 : 1 - 9