Compressed Text Indexing with Wildcards

被引:0
|
作者
Hon, Wing-Kai [1 ]
Ku, Tsung-Han [1 ]
Shah, Rahul [2 ]
Thankachan, Sharma V. [2 ]
Vitter, Jeffrey Scott [3 ]
机构
[1] Natl Tsing Hua Univ, Hsinchu, Taiwan
[2] Louisiana State Univ, Baton Rouge, LA 70803 USA
[3] Univ Kansas, Lawrence, KS 66045 USA
关键词
SUFFIX ARRAYS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Let T = T-1 phi T-k1(2)phi(k2) ... phi T-kd(d+1) be a text of total length n, where characters of each T-i are chosen from an alphabet Sigma of size sigma, and phi denotes a wildcard symbol. The text indexing with wildcards problem is to index. T such that when we are given a query pattern P we can locate the occurrences of P in T efficiently. This problem has been applied in indexing genomic sequences that contain single-nucleotide polymorphisms (SNP) because SNP can be modeled as wildcards. Recently Tam et al. (2009) and Thachuk (2011) have proposed succinct indexes for this problem. In this paper, we present the first compressed index for this problem, which takes only nH(h) + o(n log sigma) + O(d log n) bits space, where H-h is the hth-order empirical entropy (h = o(log(sigma) n)) of T.
引用
收藏
页码:267 / +
页数:3
相关论文
共 50 条
  • [41] Forty Years of Text Indexing
    Apostolico, Alberto
    Crochemore, Maxime
    Farach-Colton, Martin
    Galil, Zvi
    Muthukrishnan, S.
    COMBINATORIAL PATTERN MATCHING, 2013, 7922 : 1 - 10
  • [42] Document indexing in text categorization
    Zhang, QR
    Zhang, L
    Dong, SB
    Tan, JH
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3792 - 3796
  • [43] Automatic Subject Indexing of Text
    Golub, Koraljka
    KNOWLEDGE ORGANIZATION, 2019, 46 (02): : 104 - 121
  • [44] Improved dynamic text indexing
    Ferragina, P
    Grossi, R
    JOURNAL OF ALGORITHMS, 1999, 31 (02) : 291 - 319
  • [45] FROM TEXT TO HYPERTEXT BY INDEXING
    SALMINEN, A
    TAGUESUTCLIFFE, J
    MCCLELLAN, C
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1995, 13 (01) : 69 - 99
  • [46] Online timestamped text indexing
    Amir, A
    Landau, GM
    Ukkonen, E
    INFORMATION PROCESSING LETTERS, 2002, 82 (05) : 253 - 259
  • [47] Automatic text segmentation and text recognition for video indexing
    Lienhart, R
    Effelsberg, W
    MULTIMEDIA SYSTEMS, 2000, 8 (01) : 69 - 81
  • [48] Automatic text segmentation and text recognition for video indexing
    Rainer Lienhart
    Wolfgang Effelsberg
    Multimedia Systems, 2000, 8 : 69 - 81
  • [49] Texture, text, and context of the folklore text vs indexing
    Jason, H
    JOURNAL OF FOLKLORE RESEARCH, 1997, 34 (03) : 221 - 225
  • [50] An efficient compressed domain video indexing method
    Farahnaz Akrami
    Farzad Zargari
    Multimedia Tools and Applications, 2014, 72 : 705 - 721