A NEW SIGNATURE APPROACH FOR RETRIEVAL OF DOCUMENTS FROM FREE-TEXT DATABASES

被引:2
|
作者
TAVAKOLI, N
RAY, A
机构
[1] Department of Computer Science, University of North Carolina at Charlotte, Charlotte
关键词
SIGNATURE; FALSE DROP; INVERSION; SUPERIMPOSED CODING; FULL-TEXT;
D O I
10.1016/0306-4573(92)90043-Y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Among the techniques used for retrieval of information from free-text or document databases, signature methods have proven to be more efficient in terms of storage overhead and processing speed. Signature methods, however, present the problem of "false drops" in which a document is identified but does not satisfy the user query. In the signature approaches such as Word Signature, and Superimposed Coding, the number of false drops is directly related to the hashing function selected, signature size, and number of signature buffers used for each document. Hashing functions also generate collisions, which will result in false drops. In addition, these signature methods do not take into account the length of the words or the positional information of the characters that constitute the word. The use of "Don't Care Characters" in the queries, therefore, is not possible. This paper presents a new signature approach in which the sizes of the signature files are dependent on the number of unique symbols in the alphabet, and therefore for all documents containing English text, the size is constant. The signature generated in this technique maintains the positional information of characters and therefore allows for Don't Care Characters to be used in the queries. Implementation results and comparison of this technique to the Superimposed Coding method is presented.
引用
收藏
页码:153 / 163
页数:11
相关论文
共 50 条
  • [31] Automated Information Extraction from Free-Text EEG Reports
    Biswal, Siddharth
    Nip, Zarina
    Moura Junior, Valdcry
    Bianchi, Matt T.
    Rosenthal, Eric S.
    Westover, M. Brandon
    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 6804 - 6807
  • [32] Syndromic surveillance from free-text triage chief complaints
    Wendy W. Chapman
    Michael M. Wagner
    Oleg Ivanov
    Robert Olszewski
    John N. Dowling
    Journal of Urban Health, 2003, 80 (Suppl 1) : i120 - i120
  • [33] Reproducibly estimating drug exposure from free-text prescriptions
    Dixon, William
    Yimer, Belay
    Selby, David
    Jani, Meghna
    Lunt, Mark
    Nenadic, Goran
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2022, 31 : 87 - 87
  • [34] CANCER REPORTING FROM OCR FREE-TEXT PATHOLOGY REPORTS
    Zuccon, Guido
    Anthony Nguyen
    Bergheim, Anton
    Grayson, Narelle
    ASIA-PACIFIC JOURNAL OF CLINICAL ONCOLOGY, 2012, 8 : 327 - 328
  • [35] Fever detection from free-text clinical records for biosurveillance
    Chapman, WW
    Dowling, JN
    Wagner, MM
    JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (02) : 120 - 127
  • [36] NAMING NOTES - TRANSITIONS FROM FREE-TEXT TO STRUCTURED ENTRY
    GREGORY, J
    MATTISON, JE
    LINDE, C
    METHODS OF INFORMATION IN MEDICINE, 1995, 34 (1-2) : 57 - 67
  • [37] Classification of cancer stage from free-text histology reports
    McCowan, Iain
    Moore, Darren
    Fry, Mary-Jane
    2006 28TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-15, 2006, : 922 - +
  • [38] Data Mining from Free-Text Health Records: State of the Art, New Polish Corpus
    Anetta, Kristof
    RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING (RASLAN 2020), 2020, : 13 - 22
  • [39] A text mining approach to assist the general public in the retrieval of legal documents
    Chen, Yen-Liang
    Liu, Yi-Hung
    Ho, Wu-Liang
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2013, 64 (02): : 280 - 290
  • [40] Proposing New RadLex Terms by Analyzing Free-Text Mammography Reports
    Bulu, Hakan
    Sippo, Dorothy A.
    Lee, Janie M.
    Burnside, Elizabeth S.
    Rubin, Daniel L.
    JOURNAL OF DIGITAL IMAGING, 2018, 31 (05) : 596 - 603