Compressed full-text indexes

被引:410
|
作者
Navarro, Gonzalo
Makinen, Veli
机构
[1] Univ Chile, Santiago, Chile
[2] Univ Helsinki, FIN-00014 Helsinki, Finland
关键词
algorithms; text indexing; text compression; entropy;
D O I
10.1145/1216370.1216372
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text length. This concept has evolved into self-indexes, which in addition contain enough information to reproduce any text portion, so they replace the text. The exciting possibility of an index that takes space close to that of the compressed text, replaces it, and in addition provides fast search over it, has triggered a wealth of activity and produced surprising results in a very short time, which radically changed the status of this area in less than 5 years. The most successful indexes nowadays are able to obtain almost optimal space and search time simultaneously. In this article we present the main concepts underlying (compressed) self-indexes. We explain the relationship between text entropy and regularities that show up in index structures and permit compressing them. Then we cover the most relevant self-indexes, focusing on how they exploit text compressibility to achieve compact structures that can efficiently solve various search problems. Our aim is to give the background to understand and follow the developments in this area.
引用
收藏
页数:61
相关论文
共 50 条
  • [1] Compressed Representations of Sequences and Full-Text Indexes
    Ferragina, Paolo
    Manzini, Giovanni
    Makinen, Veli
    Navarro, Gonzalo
    ACM TRANSACTIONS ON ALGORITHMS, 2007, 3 (02)
  • [2] Distribution-Aware Compressed Full-Text Indexes
    Ferragina, Paolo
    Siren, Jouni
    Venturini, Rossano
    ALGORITHMICA, 2013, 67 (04) : 529 - 546
  • [3] Improved compressed indexes for full-text document retrieval
    Belazzougui, Djamal
    Navarro, Gonzalo
    Valenzuela, Daniel
    JOURNAL OF DISCRETE ALGORITHMS, 2013, 18 : 3 - 13
  • [4] Distribution-Aware Compressed Full-Text Indexes
    Paolo Ferragina
    Jouni Sirén
    Rossano Venturini
    Algorithmica, 2013, 67 : 529 - 546
  • [5] Distribution-Aware Compressed Full-Text Indexes
    Ferragina, Paolo
    Siren, Jouni
    Venturini, Rossano
    ALGORITHMS - ESA 2011, 2011, 6942 : 760 - 771
  • [6] Improved Compressed Indexes for Full-Text Document Retrieval
    Belazzougui, Djamal
    Navarro, Gonzalo
    STRING PROCESSING AND INFORMATION RETRIEVAL, 2011, 7024 : 386 - +
  • [7] Dynamic entropy-compressed sequences and full-text indexes
    Makinen, Veli
    Navarro, Gonzalo
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2006, 4009 : 306 - 317
  • [8] Dynamic Entropy-Compressed Sequences and Full-Text Indexes
    Maekinen, Veli
    Navarro, Gonzalo
    ACM TRANSACTIONS ON ALGORITHMS, 2008, 4 (03)
  • [9] Computing Matching Statistics and Maximal Exact Matches on Compressed Full-Text Indexes
    Ohlebusch, Enno
    Gog, Simon
    Kuegel, Adrian
    STRING PROCESSING AND INFORMATION RETRIEVAL, 2010, 6393 : 347 - 358
  • [10] Full-text indexes in external memory
    Kärkkäinen, J
    Rao, SS
    ALGORITHMS FOR MEMORY HIERARCHIES: ADVANCED LECTURES, 2003, 2625 : 149 - 170