Sparse Text Indexing in Small Space

被引:10
|
作者
Bille, Philip [1 ]
Fischer, Johannes [2 ]
Gortz, Inge Li [1 ]
Kopelowitz, Tsvi [3 ]
Sach, Benjamin [4 ]
Vildhoj, Hjalte Wedel [1 ]
机构
[1] Tech Univ Denmark, DTU Compute, DK-2800 Lyngby, Denmark
[2] TU Dortmund, Dept Comp Sci, Otto Hahn Str 14, D-44227 Dortmund, Germany
[3] Weizmann Inst Sci, Fac Math & Comp Sci, 234 Herzl St, IL-76100 Rehovot, Israel
[4] Univ Bristol, Dept Comp Sci, Merchant Venturers Bldg, Bristol BS8 1TH, Avon, England
关键词
Sparse text indexing; sparse suffix tree; sparse suffix array; sparse suffix sorting; sparse position heap; Karp Rabin fingerprints; SUFFIX ARRAYS; CONSTRUCTION; ALGORITHM; TREES;
D O I
10.1145/2836166
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this work, we present efficient algorithms for constructing sparse suffix trees, sparse suffix arrays, and sparse position heaps for b arbitrary positions of a text T of length n while using only O(b) words of space during the construction. Attempts at breaking the naive bound of Omega(nb) time for constructing sparse suffix trees in O(b) space can be traced back to the origins of string indexing in 1968. First results were not obtained until 1996, but only for the case in which the b suffixes were evenly spaced in T. In this article, there is no constraint on the locations of the suffixes. Our main contribution is to show that the sparse suffix tree (and array) can be constructed in O(n log(2) b) time. To achieve this, we develop a technique that allows one to efficiently answer b longest common prefix queries on suffixes of T, using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is Monte Carlo, and outputs the correct tree with high probability. We then give a Las Vegas algorithm, which also uses O(b) space and runs in the same time bounds with high probability when b = O(root n). Additional trade-offs between space usage and construction time for the Monte Carlo algorithm are given. Finally, we show that, at the expense of slower pattern queries, it is possible to construct sparse position heaps in O(n+b log b) time and O(b) space.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Locally Consistent Parsing for Text Indexing in Small Space
    Birenzwige, Or
    Golan, Shay
    Porat, Ely
    PROCEEDINGS OF THE 2020 ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2020, : 607 - 626
  • [2] Locally Consistent Parsing for Text Indexing in Small Space
    Birenzwige, Or
    Golan, Shay
    Porat, Ely
    PROCEEDINGS OF THE THIRTY-FIRST ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'20), 2020, : 607 - 626
  • [3] Optimal Substring Equality Queries with Applications to Sparse Text Indexing
    Prezza, Nicola
    ACM TRANSACTIONS ON ALGORITHMS, 2021, 17 (01)
  • [4] Improving space-efficiency in temporal text-indexing
    Norvåg, K
    Nybo, AO
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2005, 3453 : 791 - 802
  • [5] Geometric BWT: Compressed Text Indexing via Sparse Suffixes and Range Searching
    Chien, Yu-Feng
    Hon, Wing-Kai
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    ALGORITHMICA, 2015, 71 (02) : 258 - 278
  • [6] Geometric BWT: Compressed Text Indexing via Sparse Suffixes and Range Searching
    Yu-Feng Chien
    Wing-Kai Hon
    Rahul Shah
    Sharma V. Thankachan
    Jeffrey Scott Vitter
    Algorithmica, 2015, 71 : 258 - 278
  • [7] A new full-text indexing model with low space overhead for chinese text retrieval
    Zhou S.
    Guan J.
    International Journal on Digital Libraries, 2004, 4 (4) : 272 - 282
  • [8] Text indexing with errors
    Maass, MG
    Nowak, J
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2005, 3537 : 21 - 32
  • [9] Text indexing with errors
    Maass, Moritz G.
    Nowak, Johannes
    JOURNAL OF DISCRETE ALGORITHMS, 2007, 5 (04) : 662 - 681
  • [10] SEMANTIC TEXT INDEXING
    Kaleta, Zbigniew
    COMPUTER SCIENCE-AGH, 2014, 15 (01): : 19 - 34