A NEW SIGNATURE APPROACH FOR RETRIEVAL OF DOCUMENTS FROM FREE-TEXT DATABASES

被引:2
|
作者
TAVAKOLI, N
RAY, A
机构
[1] Department of Computer Science, University of North Carolina at Charlotte, Charlotte
关键词
SIGNATURE; FALSE DROP; INVERSION; SUPERIMPOSED CODING; FULL-TEXT;
D O I
10.1016/0306-4573(92)90043-Y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Among the techniques used for retrieval of information from free-text or document databases, signature methods have proven to be more efficient in terms of storage overhead and processing speed. Signature methods, however, present the problem of "false drops" in which a document is identified but does not satisfy the user query. In the signature approaches such as Word Signature, and Superimposed Coding, the number of false drops is directly related to the hashing function selected, signature size, and number of signature buffers used for each document. Hashing functions also generate collisions, which will result in false drops. In addition, these signature methods do not take into account the length of the words or the positional information of the characters that constitute the word. The use of "Don't Care Characters" in the queries, therefore, is not possible. This paper presents a new signature approach in which the sizes of the signature files are dependent on the number of unique symbols in the alphabet, and therefore for all documents containing English text, the size is constant. The signature generated in this technique maintains the positional information of characters and therefore allows for Don't Care Characters to be used in the queries. Implementation results and comparison of this technique to the Superimposed Coding method is presented.
引用
收藏
页码:153 / 163
页数:11
相关论文
共 50 条
  • [21] BENEFITS AND COSTS OF FREE-TEXT SEARCHING ON FTD CIRC REFERENCE RETRIEVAL SYSTEM
    DIFONDI, NM
    MANGIO, CA
    RUBERTI, RN
    PROCEEDINGS OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1973, 10 : 47 - 48
  • [22] COST-PERFORMANCE OF AN ON-LINE, FREE-TEXT BIBLIOGRAPHIC RETRIEVAL SYSTEM
    KATZER, J
    INFORMATION STORAGE AND RETRIEVAL, 1973, 9 (06): : 321 - 329
  • [23] A methodology to retrieve text documents from multiple databases
    Yu, C
    Liu, KL
    Meng, WY
    Wu, ZH
    Rishe, N
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (06) : 1347 - 1361
  • [24] Extracting Information from Free-text Mammography Reports
    Esuli, Andrea
    Marcheggiani, Diego
    Sebastiani, Fabrizio
    ERCIM NEWS, 2010, (82): : 60 - 61
  • [25] Assessing the Representation of Occupation Information in Free-Text Clinical Documents Across Multiple Sources
    Lindemann, Elizabeth A.
    Chen, Elizabeth S.
    Rajamani, Sripriya
    Manohar, Nivedha
    Wang, Yan
    Melton, Genevieve B.
    MEDINFO 2017: PRECISION HEALTHCARE THROUGH INFORMATICS, 2017, 245 : 486 - 490
  • [26] Automated information extraction from free-text medical documents for stroke key performance indicators: a pilot study
    Bacchi, Stephen
    Gluck, Sam
    Koblar, Simon
    Jannes, Jim
    Kleinig, Timothy
    INTERNAL MEDICINE JOURNAL, 2022, 52 (02) : 315 - 317
  • [27] Authoring cases from free-text maintenance data
    Yang, CS
    Orchard, R
    Farley, B
    Zaluski, M
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2003, 2734 : 131 - 140
  • [28] Free-text information retrieval system for a rapid enrollment of patients into clinical trials.
    Averbuch, M
    Maimon, O
    Rokach, L
    Ezer, E
    CLINICAL PHARMACOLOGY & THERAPEUTICS, 2005, 77 (02) : P13 - P13
  • [29] Extracting Predictive Models from Marked-Up Free-Text Documents at the Royal Botanic Gardens, Kew, London
    Tucker, Allan
    Kirkup, Don
    ADVANCES IN INTELLIGENT DATA ANALYSIS XIII, 2014, 8819 : 309 - 320
  • [30] Extracting information from free-text aircraft repair notes
    Farley, B
    AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 2001, 15 (04): : 295 - 305