A NEW SIGNATURE APPROACH FOR RETRIEVAL OF DOCUMENTS FROM FREE-TEXT DATABASES

被引:2
|
作者
TAVAKOLI, N
RAY, A
机构
[1] Department of Computer Science, University of North Carolina at Charlotte, Charlotte
关键词
SIGNATURE; FALSE DROP; INVERSION; SUPERIMPOSED CODING; FULL-TEXT;
D O I
10.1016/0306-4573(92)90043-Y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Among the techniques used for retrieval of information from free-text or document databases, signature methods have proven to be more efficient in terms of storage overhead and processing speed. Signature methods, however, present the problem of "false drops" in which a document is identified but does not satisfy the user query. In the signature approaches such as Word Signature, and Superimposed Coding, the number of false drops is directly related to the hashing function selected, signature size, and number of signature buffers used for each document. Hashing functions also generate collisions, which will result in false drops. In addition, these signature methods do not take into account the length of the words or the positional information of the characters that constitute the word. The use of "Don't Care Characters" in the queries, therefore, is not possible. This paper presents a new signature approach in which the sizes of the signature files are dependent on the number of unique symbols in the alphabet, and therefore for all documents containing English text, the size is constant. The signature generated in this technique maintains the positional information of characters and therefore allows for Don't Care Characters to be used in the queries. Implementation results and comparison of this technique to the Superimposed Coding method is presented.
引用
收藏
页码:153 / 163
页数:11
相关论文
共 50 条
  • [41] Learning From Free-Text Human Feedback - Collect New Datasets Or Extend Existing Ones?
    Petraki, Dominic
    Moosavi, Nafise Sadat
    Tian, Ye
    Rozanov, Nikolai
    Gurevych, Iryna
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 16259 - 16279
  • [42] Free-text medical document retrieval via phrase-based vector space model
    Mao, WL
    Chu, WW
    AMIA 2002 SYMPOSIUM, PROCEEDINGS: BIOMEDICAL INFORMATICS: ONE DISCIPLINE, 2002, : 489 - 493
  • [43] Proposing New RadLex Terms by Analyzing Free-Text Mammography Reports
    Hakan Bulu
    Dorothy A. Sippo
    Janie M. Lee
    Elizabeth S. Burnside
    Daniel L. Rubin
    Journal of Digital Imaging, 2018, 31 : 596 - 603
  • [44] Subject retrieval from full-text databases in the humanities
    East, John W.
    PORTAL-LIBRARIES AND THE ACADEMY, 2007, 7 (02) : 227 - 241
  • [45] Rule-based approach for identifying assertions in clinical free-text data
    Sun, Yue Kimi
    Nguyen, Anthony
    Sitbon, Laurianne
    Geva, Shlomo
    ADCS 2010 - Proceedings of the Fifteenth Australasian Document Computing Symposium, 2010, : 93 - 96
  • [46] A Text Mining Approach in the Classification of Free-Text Cancer Pathology Reports from the South African National Health Laboratory Services
    Achilonu, Okechinyere J.
    Olago, Victor
    Singh, Elvira
    Eijkemans, Rene M. J. C.
    Nimako, Gideon
    Musenge, Eustasius
    INFORMATION, 2021, 12 (11)
  • [47] Unsupervised identification of crime problems from police free-text data
    Birks, Daniel
    Coleman, Alex
    Jackson, David
    CRIME SCIENCE, 2020, 9 (01)
  • [48] Creating and indexing teaching files from free-text patient reports
    Johnson, DB
    Chu, WW
    Dionisio, JD
    Taira, RK
    Kangarloo, H
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1999, : 814 - 818
  • [49] Extracting Cancer Mortality Statistics from Free-text Death Certificates
    Koopman, Bevan
    Nguyen, Anthony
    Cossio, Danica
    Courage, Mary-Jane
    Francois, Gary
    ADCS'18: PROCEEDINGS OF THE 23RD AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM, 2018,
  • [50] Development of image-based decision support systems utilizing information extracted from radiological free-text report databases with text-based transformers
    Nowak, Sebastian
    Schneider, Helen
    Layer, Yannik C.
    Theis, Maike
    Biesner, David
    Block, Wolfgang
    Wulff, Benjamin
    Attenberger, Ulrike I.
    Sifa, Rafet
    Sprinkart, Alois M.
    EUROPEAN RADIOLOGY, 2024, 34 (05) : 2895 - 2904