Lightweight Data Indexing and Compression in External Memory

被引:40
|
作者
Ferragina, Paolo [2 ]
Gagie, Travis [3 ]
Manzini, Giovanni [1 ]
机构
[1] Univ Piemonte Orientale, Dipartimento Informat, Alessandria, Italy
[2] Univ Pisa, Dipartimento Informat, Pisa, Italy
[3] Aalto Univ, Dept Comp Sci & Engn, Helsinki, Finland
关键词
Burrows-Wheeler transform; Compressed indexes; Data compression; Space-efficient algorithms; External memory scan-based algorithms; BURROWS-WHEELER TRANSFORM; SUFFIX ARRAYS; SPACE; ALGORITHM; BWT;
D O I
10.1007/s00453-011-9535-0
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper we describe algorithms for computing the Burrows-Wheeler Transform (bwt) and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size n, they use only n bits of working space on disk while all previous approaches use I similar to(nlog n) bits. This is achieved by building the bwt directly without passing through the construction of the Suffix Array/Tree data structure. Moreover, our algorithms access disk data only via sequential scans, thus they take full advantage of modern disk features that make sequential disk accesses much faster than random accesses. We also present a scan-based algorithm for inverting the bwt that uses I similar to(n) bits of working space, and a lightweight internal-memory algorithm for computing the bwt which is the fastest in the literature when the available working space is o(n) bits. Finally, we prove lower bounds on the complexity of computing and inverting the bwt via sequential scans in terms of the classic product: internal-memory space x number of passes over the disk data, showing that our algorithms are within an O(log n) factor of the optimal.
引用
收藏
页码:707 / 730
页数:24
相关论文
共 50 条
  • [41] Texture-based image indexing in the process of lossless data compression
    Jiang, J
    Liu, MG
    Hou, CH
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2003, 150 (03): : 198 - 204
  • [42] Web data indexing through external semantic-carrying annotations
    Bremer, JM
    Gertz, M
    ELEVENTH INTERNATIONAL WORKSHOP ON RESEARCH ISSUES IN DATA ENGINEERING, PROCEEDINGS: DOCUMENT MANAGEMENT FOR DATA INTENSIVE BUSINESS AND SCIENTIFIC APPLICATIONS, 2001, : 69 - 76
  • [43] Lightweight Compressive Sensing for Joint Compression and Encryption of Sensor Data
    Chatamoni, Anil Kumar
    Bhukya, Rajendra Naik
    INTERNATIONAL JOURNAL OF ENGINEERING AND TECHNOLOGY INNOVATION, 2022, 12 (02) : 167 - 181
  • [44] Memory management in flash-memory disks with data compression
    Kjelso, M
    Jones, S
    MEMORY MANAGEMENT, 1995, 986 : 399 - 413
  • [45] μ- PBWT: a lightweight r-indexing of the PBWT for storing and querying UK Biobank data
    Cozzi, Davide
    Rossi, Massimiliano
    Rubinacci, Simone
    Gagie, Travis
    Koeppl, Dominik
    Boucher, Christina
    Bonizzoni, Paola
    BIOINFORMATICS, 2023, 39 (09)
  • [46] Improving Efficiency of Wireless Sensor Networks Through Lightweight In-Memory Compression
    Xu, Haifeng
    Li, Yong
    Collinge, William O.
    Schaefer, Laura A.
    Bilec, Melissa M.
    Jones, Alex K.
    Landis, Amy E.
    2015 SIXTH INTERNATIONAL GREEN COMPUTING CONFERENCE AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2015,
  • [47] Efficient Compression and Indexing of Trajectories
    Brisaboa, Nieves R.
    Gagie, Travis
    Gomez-Brandon, Adrian
    Navarro, Gonzalo
    Parama, Jose R.
    STRING PROCESSING AND INFORMATION RETRIEVAL (SPIRE 2017), 2017, 10508 : 103 - 115
  • [48] Inverted image indexing and compression
    So, WWS
    Leung, CHC
    MULTIMEDIA STORAGE AND ARCHIVING SYSTEMS II, 1997, 3229 : 254 - 263
  • [49] Efficient Data Compression in Perception and Perceptual Memory
    Bates, Christopher J.
    Jacobs, Robert A.
    PSYCHOLOGICAL REVIEW, 2020, 127 (05) : 891 - 917
  • [50] Parallel Compression and Indexing of Large-Scale Geospatial Raster Data with GPGPUs
    Kaligirwa, Nathalie
    Leal, Eleazar
    Gruenwald, Le
    Zhang, Jianting
    You, Simin
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 137 - 144