Lightweight Data Indexing and Compression in External Memory

被引:40
|
作者
Ferragina, Paolo [2 ]
Gagie, Travis [3 ]
Manzini, Giovanni [1 ]
机构
[1] Univ Piemonte Orientale, Dipartimento Informat, Alessandria, Italy
[2] Univ Pisa, Dipartimento Informat, Pisa, Italy
[3] Aalto Univ, Dept Comp Sci & Engn, Helsinki, Finland
关键词
Burrows-Wheeler transform; Compressed indexes; Data compression; Space-efficient algorithms; External memory scan-based algorithms; BURROWS-WHEELER TRANSFORM; SUFFIX ARRAYS; SPACE; ALGORITHM; BWT;
D O I
10.1007/s00453-011-9535-0
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper we describe algorithms for computing the Burrows-Wheeler Transform (bwt) and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size n, they use only n bits of working space on disk while all previous approaches use I similar to(nlog n) bits. This is achieved by building the bwt directly without passing through the construction of the Suffix Array/Tree data structure. Moreover, our algorithms access disk data only via sequential scans, thus they take full advantage of modern disk features that make sequential disk accesses much faster than random accesses. We also present a scan-based algorithm for inverting the bwt that uses I similar to(n) bits of working space, and a lightweight internal-memory algorithm for computing the bwt which is the fastest in the literature when the available working space is o(n) bits. Finally, we prove lower bounds on the complexity of computing and inverting the bwt via sequential scans in terms of the classic product: internal-memory space x number of passes over the disk data, showing that our algorithms are within an O(log n) factor of the optimal.
引用
收藏
页码:707 / 730
页数:24
相关论文
共 50 条
  • [1] Lightweight Data Indexing and Compression in External Memory
    Paolo Ferragina
    Travis Gagie
    Giovanni Manzini
    Algorithmica, 2012, 63 : 707 - 730
  • [2] Lightweight Data Indexing and Compression in External Memory
    Ferragina, Paolo
    Gagie, Travis
    Manzini, Giovanni
    LATIN 2010: THEORETICAL INFORMATICS, 2010, 6034 : 697 - +
  • [3] WaterfallTree - External Indexing Data Structure
    Tronkov, Iliya
    2014 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION, QUALITY AND TESTING, ROBOTICS, 2014,
  • [4] Lightweight Indexing and Querying Services for Big Spatial Data
    Lee, Kisung
    Liu, Ling
    Ganti, Raghu K.
    Srivatsa, Mudhakar
    Zhang, Qi
    Zhou, Yang
    Wang, Qingyang
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2019, 12 (03) : 343 - 355
  • [5] On Entropy-Compressed Text Indexing in External Memory
    Hon, Wing-Kai
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5721 : 75 - +
  • [6] Mismatch address indexing for test data compression
    Lee, Lung-Jen
    Tseng, Wang-Dauh
    Lin, Rung-Bin
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2011, 34 (08) : 1035 - 1045
  • [7] Compression, Indexing, and Retrieval for Massive String Data
    Hon, Wing-Kai
    Shah, Rahul
    Vitter, Jeffrey Scott
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2010, 6129 : 260 - +
  • [8] Bidirectional Text Compression in External Memory
    Dinklage, Patrick
    Ellert, Jonas
    Fischer, Johannes
    Koppl, Dominik
    Penschuck, Manuel
    27TH ANNUAL EUROPEAN SYMPOSIUM ON ALGORITHMS (ESA 2019), 2019, 144
  • [9] Fast data series indexing for in-memory data
    Botao Peng
    Panagiota Fatourou
    Themis Palpanas
    The VLDB Journal, 2021, 30 : 1041 - 1067
  • [10] Fast data series indexing for in-memory data
    Peng, Botao
    Fatourou, Panagiota
    Palpanas, Themis
    VLDB JOURNAL, 2021, 30 (06): : 1041 - 1067