Text indexing with errors

被引:0
|
作者
Maass, MG [1 ]
Nowak, J [1 ]
机构
[1] Tech Univ Munich, Fak Informat, D-85748 Garching, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the lookup time is not linear or depends upon the size of the document corpus. Our data structure has size O(n log(k) n) on average and with high probability for input size n and queries with up to k errors. Additionally, we present a trade-off between query time and index complexity that achieves worst-case bounded index size and preprocessing time with linear lookup time on average.
引用
收藏
页码:21 / 32
页数:12
相关论文
共 50 条
  • [11] Document indexing in text categorization
    Zhang, QR
    Zhang, L
    Dong, SB
    Tan, JH
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3792 - 3796
  • [12] Automatic Subject Indexing of Text
    Golub, Koraljka
    KNOWLEDGE ORGANIZATION, 2019, 46 (02): : 104 - 121
  • [13] Improved dynamic text indexing
    Ferragina, P
    Grossi, R
    JOURNAL OF ALGORITHMS, 1999, 31 (02) : 291 - 319
  • [14] Compressed Text Indexing with Wildcards
    Hon, Wing-Kai
    Ku, Tsung-Han
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    STRING PROCESSING AND INFORMATION RETRIEVAL, 2011, 7024 : 267 - +
  • [15] FROM TEXT TO HYPERTEXT BY INDEXING
    SALMINEN, A
    TAGUESUTCLIFFE, J
    MCCLELLAN, C
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1995, 13 (01) : 69 - 99
  • [16] Succinct Text Indexing with Wildcards
    Tam, Alan
    Wu, Edward
    Lam, Tak-Wah
    Yiu, Siu-Ming
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5721 : 39 - 50
  • [17] Universal compressed text indexing
    Navarro, Gonzalo
    Prezza, Nicola
    THEORETICAL COMPUTER SCIENCE, 2019, 762 : 41 - 50
  • [18] Online timestamped text indexing
    Amir, A
    Landau, GM
    Ukkonen, E
    INFORMATION PROCESSING LETTERS, 2002, 82 (05) : 253 - 259
  • [19] Errors in text
    Schubert, David
    JOURNAL OF THE ROYAL SOCIETY OF MEDICINE, 2008, 101 (09) : 435 - 435
  • [20] Automatic text segmentation and text recognition for video indexing
    Lienhart, R
    Effelsberg, W
    MULTIMEDIA SYSTEMS, 2000, 8 (01) : 69 - 81