Lightweight Data Indexing and Compression in External Memory

被引:40
|
作者
Ferragina, Paolo [2 ]
Gagie, Travis [3 ]
Manzini, Giovanni [1 ]
机构
[1] Univ Piemonte Orientale, Dipartimento Informat, Alessandria, Italy
[2] Univ Pisa, Dipartimento Informat, Pisa, Italy
[3] Aalto Univ, Dept Comp Sci & Engn, Helsinki, Finland
关键词
Burrows-Wheeler transform; Compressed indexes; Data compression; Space-efficient algorithms; External memory scan-based algorithms; BURROWS-WHEELER TRANSFORM; SUFFIX ARRAYS; SPACE; ALGORITHM; BWT;
D O I
10.1007/s00453-011-9535-0
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper we describe algorithms for computing the Burrows-Wheeler Transform (bwt) and for building (compressed) indexes in external memory. The innovative feature of our algorithms is that they are lightweight in the sense that, for an input of size n, they use only n bits of working space on disk while all previous approaches use I similar to(nlog n) bits. This is achieved by building the bwt directly without passing through the construction of the Suffix Array/Tree data structure. Moreover, our algorithms access disk data only via sequential scans, thus they take full advantage of modern disk features that make sequential disk accesses much faster than random accesses. We also present a scan-based algorithm for inverting the bwt that uses I similar to(n) bits of working space, and a lightweight internal-memory algorithm for computing the bwt which is the fastest in the literature when the available working space is o(n) bits. Finally, we prove lower bounds on the complexity of computing and inverting the bwt via sequential scans in terms of the classic product: internal-memory space x number of passes over the disk data, showing that our algorithms are within an O(log n) factor of the optimal.
引用
收藏
页码:707 / 730
页数:24
相关论文
共 50 条
  • [21] FuzzyCAT: a Lightweight Adaptive Transform for Sensor Data Compression
    Bashlovkina, Vasilisa
    Abdelaal, Mohamed
    Theel, Oliver
    2015 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION WORKSHOP (ICCW), 2015, : 2756 - 2762
  • [22] Lightweight memory tracing for hot data identification
    Lee, Yunjae
    Kim, Yoonhee
    Yeom, Heon Y.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (03): : 2273 - 2285
  • [23] Protect Sensitive Data with Lightweight Memory Encryption
    Zhou, Hongwei
    Yuan, Jinhui
    Xiao, Rui
    Zhang, Kai
    Sun, Jingyao
    ADVANCES IN MATERIALS, MACHINERY, ELECTRONICS II, 2018, 1955
  • [24] Lightweight memory tracing for hot data identification
    Yunjae Lee
    Yoonhee Kim
    Heon Y. Yeom
    Cluster Computing, 2020, 23 : 2273 - 2285
  • [25] Direct lightweight temporal compression for wearable sensor data
    Klus L.
    Klus R.
    Lohan E.S.
    Granell C.
    Talvitie J.
    Valkama M.
    Nurmi J.
    IEEE Sensors Letters, 2021, 5 (02):
  • [26] Balancing Performance and Energy for Lightweight Data Compression Algorithms
    Ungethum, Annett
    Damme, Patrick
    Pietrzyk, Johannes
    Krause, Alexander
    Habich, Dirk
    Lehner, Wolfgang
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 767 : 37 - 44
  • [27] Efficient and lightweight indexing approach for multi-dimensional historical data in blockchain
    Singh, Bikash Chandra
    Ye, Qingqing
    Hu, Haibo
    Xiao, Bin
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 139 : 210 - 223
  • [28] Data Compression in Computer Memory.
    Konon, N.I.
    Platonov, P.I.
    Skorubskii, V.I.
    Izvestia vyssih ucebnyh zavedenij. Priborostroenie, 1979, 22 : 61 - 65
  • [29] A connectionist model of data compression in memory
    Iyer, Laxmi R.
    Ho, Seng-Beng
    BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES, 2013, 6 : 58 - 66
  • [30] BOUNCE: Memory-Efficient SIMD Approach for Lightweight Integer Compression
    Hildebrandt, Juliana
    Habich, Dirk
    Lehner, Wolfgang
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2022), 2022, : 123 - 128