Suffix Array Construction in External Memory Using D-Critical Substrings

被引:15
|
作者
Nong, Ge [1 ]
Chan, Wai Hong [2 ]
Zhang, Sen [3 ]
Guan, Xiao Feng [1 ]
机构
[1] Sun Yat Sen Univ, Dept Comp Sci, Guangzhou 510275, Guangdong, Peoples R China
[2] Hong Kong Inst Educ, Dept Math & Informat Technol, Kowloon, Hong Kong, Peoples R China
[3] SUNY Coll Oneonta, Dept Math Comp Sci & Stat, Oneonta, NY 13820 USA
关键词
Suffix array; sorting algorithm; external memory; Algorithms; Performance; LINEAR-TIME CONSTRUCTION;
D O I
10.1145/2518175
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a new suffix array construction algorithm that aims to build, in external memory, the suffix array for an input string of length n measured in the magnitude of tens of Giga characters over a constant or integer alphabet. The core of this algorithm is adapted from the framework of the original internal memory SA-DS algorithm that samples fixed-size d-critical substrings. This new external-memory algorithm, called EM-SA-DS, uses novel cache data structures to construct a suffix array in a sequential scanning manner with good data spatial locality: data is read from or written to disk sequentially. On the assumed external-memory model with RAM capacity Omega ((nB)(0.5)), disk capacity O(n), and size of each I/O block B, all measured in log n-bit words, the I/O complexity of EM-SA-DS is O(n/B). This work provides a general cache-based solution that could be further exploited to develop external-memory solutions for other suffix-array-related problems, for example, computing the longest-common-prefix array, using a modern personal computer with a typical memory configuration of 4GB RAM and a single disk.
引用
收藏
页数:15
相关论文
共 23 条
  • [21] A 1Tb 4b/Cell 96-Stacked-WL 3D NAND Flash Memory with 30MB/s Program Throughput Using Peripheral Circuit Under Memory Cell Array Technique
    Huh, Hwang
    Cho, Wanik
    Lee, Jinhaeng
    Noh, Yujong
    Park, Yongsoon
    Ok, Sunghwa
    Kim, Jongwoo
    Cho, Kayoung
    Lee, Hyunchul
    Kim, Geonu
    Park, Kangwoo
    Kim, Kwanho
    Lee, Heejoo
    Chai, Sooyeol
    Kwon, Chankeun
    Cho, Hanna
    Jeong, Chanhui
    Yang, Yujin
    Goo, Jayoon
    Park, Jangwon
    Lee, Juhyeong
    Kim, Heonki
    Jo, Kangwook
    Park, Cheoljoong
    Nam, Hyeonsu
    Song, Hyunseok
    Lee, Sangkyu
    Jeong, Woopyo
    Ahn, Kun-Ok
    Jung, Tae-Sung
    2020 IEEE INTERNATIONAL SOLID- STATE CIRCUITS CONFERENCE (ISSCC), 2020, : 220 - 221
  • [22] A Stacked Embedded DRAM Array for LPDDR4/4X using Hybrid Bonding 3D Integration with 34GB/s/1Gb 0.88pJ/b Logic-to-Memory Interface
    Bai Fujun
    Jiang Xiping
    Wang Song
    Yu Bing
    Tan Jie
    Zuo Fengguo
    Wang Chunjuan
    Wang Fan
    Long Xiaodong
    Yu Guoqing
    Fu Ni
    Li Qiannan
    Li Hua
    Wang Kexin
    Duan Huifu
    Bai Liang
    Jia Xuerong
    Li Jin
    Li Mei
    Wang Zhengwen
    Hu Sheng
    Zhou Jun
    Zhan Qiong
    Sun Peng
    Yang Daohong
    Kau, Cheichan
    Yang, David
    Ho, Ching-Sung
    Sun Hongbin
    Lv Hangbing
    Liu Ming
    Kang Yi
    Ren Qiwei
    2020 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2020,
  • [23] External validation of a prognostic model for intensive care unit mortality: a retrospective study using the Ontario Critical Care Information System; [Validation externe d’un modèle pronostique de la mortalité à l’unité de soins intensifs : une étude rétrospective fondée sur le Système d’information sur les soins aux malades en phase critique de l’Ontario]
    Priestap F.
    Kao R.
    Martin C.M.
    Canadian Journal of Anesthesia/Journal canadien d'anesthésie, 2020, 67 (8): : 981 - 991