Compressed indexes for text with wildcards

被引:3
|
作者
Thachuk, Chris [1 ]
机构
[1] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1Z4, Canada
关键词
Compressed indexes; Text search; Wildcard matching; Dictionary matching; SEARCH;
D O I
10.1016/j.tcs.2012.08.011
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study the problem of indexing text with wildcard positions, motivated by the challenge of aligning sequencing data to large genomes that contain millions of single nucleotide polymorphisms (SNPs)-positions known to differ between individuals. SNPs modeled as wildcards can lead to more informed and biologically relevant alignments. We improve the space complexity while maintaining the query time complexity of previous approaches by giving a compressed index requiring 2nH(k)(T) + o(n log sigma) + O(n + d logo) bits for a text T of length n over an alphabet of size a containing d groups of wildcards. The new index is particularly favorable for larger alphabets and comparable for smaller alphabets, such as DNA. A key to the space reduction is a result we give showing how any compressed suffix array can be supplemented with auxiliary data structures occupying O(n) + O(d log n/d) bits to also support efficient dictionary matching queries. We discuss how the space can be reduced further by a number of approaches and by allowing an increase in the worst case query time. We also present a new query algorithm for our wildcard indexes that can greatly reduce the query working space. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:22 / 35
页数:14
相关论文
共 50 条
  • [21] Evaluating Compressed Indexes in DBMS
    Anani, Oz
    Lushi, Gal
    Hershcovitch, Moshik
    Morrison, Adam
    PROCEEDINGS OF THE 15TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS AND STORAGE, SYSTOR 2022, 2022, : 149 - 149
  • [22] Indexing compressed text
    Ferragina, P
    Manzini, G
    JOURNAL OF THE ACM, 2005, 52 (04) : 552 - 581
  • [23] Compressed indexes for approximate string matching
    Chan, Ho-Leung
    Lam, Tak-Wah
    Sung, Wing-Kin
    Tam, Siu-Lung
    Wong, Swee-Seong
    ALGORITHMS - ESA 2006, PROCEEDINGS, 2006, 4168 : 208 - 219
  • [24] Compressed Indexes for Approximate Library Management
    Hon, Wing-Kai
    Wu, Winson
    Yang, Ting-Shuo
    2010 DATA COMPRESSION CONFERENCE (DCC 2010), 2010, : 534 - 534
  • [25] Compressed indexes for fast search in sequences
    Grossi, R
    Vitter, JS
    PROCEEDINGS OF THE 6TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2002, : 44 - 47
  • [26] Compressed Indexes for Aligned Pattern Matching
    Thankachan, Sharma V.
    STRING PROCESSING AND INFORMATION RETRIEVAL, 2011, 7024 : 410 - 419
  • [27] Approximate String Matching with Compressed Indexes
    Russo, Luis M. S.
    Navarro, Gonzalo
    Oliveira, Arlindo L.
    Morales, Pedro
    ALGORITHMS, 2009, 2 (03): : 1105 - 1136
  • [28] Compressed Indexes for Approximate String Matching
    Ho-Leung Chan
    Tak-Wah Lam
    Wing-Kin Sung
    Siu-Lung Tam
    Swee-Seong Wong
    Algorithmica, 2010, 58 : 263 - 281
  • [29] Compressed Indexes for Approximate String Matching
    Chan, Ho-Leung
    Lam, Tak-Wah
    Sung, Wing-Kin
    Tam, Siu-Lung
    Wong, Swee-Seong
    ALGORITHMICA, 2010, 58 (02) : 263 - 281
  • [30] Compressed index for dynamic text
    Hon, WK
    Lam, TW
    Sadakane, K
    Sung, WK
    Yiu, SM
    DCC 2004: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2004, : 102 - 111