Suffix Array Construction in External Memory Using D-Critical Substrings

被引:15
|
作者
Nong, Ge [1 ]
Chan, Wai Hong [2 ]
Zhang, Sen [3 ]
Guan, Xiao Feng [1 ]
机构
[1] Sun Yat Sen Univ, Dept Comp Sci, Guangzhou 510275, Guangdong, Peoples R China
[2] Hong Kong Inst Educ, Dept Math & Informat Technol, Kowloon, Hong Kong, Peoples R China
[3] SUNY Coll Oneonta, Dept Math Comp Sci & Stat, Oneonta, NY 13820 USA
关键词
Suffix array; sorting algorithm; external memory; Algorithms; Performance; LINEAR-TIME CONSTRUCTION;
D O I
10.1145/2518175
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a new suffix array construction algorithm that aims to build, in external memory, the suffix array for an input string of length n measured in the magnitude of tens of Giga characters over a constant or integer alphabet. The core of this algorithm is adapted from the framework of the original internal memory SA-DS algorithm that samples fixed-size d-critical substrings. This new external-memory algorithm, called EM-SA-DS, uses novel cache data structures to construct a suffix array in a sequential scanning manner with good data spatial locality: data is read from or written to disk sequentially. On the assumed external-memory model with RAM capacity Omega ((nB)(0.5)), disk capacity O(n), and size of each I/O block B, all measured in log n-bit words, the I/O complexity of EM-SA-DS is O(n/B). This work provides a general cache-based solution that could be further exploited to develop external-memory solutions for other suffix-array-related problems, for example, computing the longest-common-prefix array, using a modern personal computer with a typical memory configuration of 4GB RAM and a single disk.
引用
收藏
页数:15
相关论文
共 23 条
  • [1] Linear Time Suffix Array Construction Using D-Critical Substrings
    Nong, Ge
    Zhang, Sen
    Chan, Wai Hong
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2009, 5577 : 54 - +
  • [2] Generalized enhanced suffix array construction in external memory
    Felipe A. Louza
    Guilherme P. Telles
    Steve Hoffmann
    Cristina D. A. Ciferri
    Algorithms for Molecular Biology, 12
  • [3] Generalized enhanced suffix array construction in external memory
    Louza, Felipe A.
    Telles, Guilherme P.
    Hoffmann, Steve
    Ciferri, Cristina D. A.
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2017, 12
  • [4] Engineering a Lightweight External Memory Suffix Array Construction Algorithm
    Kärkkäinen J.
    Kempa D.
    Mathematics in Computer Science, 2017, 11 (2) : 137 - 149
  • [5] External Memory Generalized Suffix and LCP Arrays Construction
    Louza, Felipe A.
    Telles, Guilherme P.
    De Aguiar Ciferri, Cristina Dutra
    COMBINATORIAL PATTERN MATCHING, 2013, 7922 : 201 - 210
  • [6] Parallel Suffix Array Construction for Shared Memory Architectures
    Osipov, Vitaly
    STRING PROCESSING AND INFORMATION RETRIEVAL: 19TH INTERNATIONAL SYMPOSIUM, SPIRE 2012, 2012, 7608 : 379 - 384
  • [7] Using GPU to Accelerate Suffix Array Construction
    Sun, Weidong
    2014 7TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2014), 2014, : 677 - 682
  • [8] A theoretical and experimental study on the construction of suffix arrays in external memory
    Crauser, A
    Ferragina, P
    ALGORITHMICA, 2002, 32 (01) : 1 - 35
  • [9] A survey of practical algorithms for suffix tree construction in external memory
    Barsky, M.
    Stege, U.
    Thomo, A.
    SOFTWARE-PRACTICE & EXPERIENCE, 2010, 40 (11): : 965 - 988