A NEW SIGNATURE APPROACH FOR RETRIEVAL OF DOCUMENTS FROM FREE-TEXT DATABASES

被引:2
|
作者
TAVAKOLI, N
RAY, A
机构
[1] Department of Computer Science, University of North Carolina at Charlotte, Charlotte
关键词
SIGNATURE; FALSE DROP; INVERSION; SUPERIMPOSED CODING; FULL-TEXT;
D O I
10.1016/0306-4573(92)90043-Y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Among the techniques used for retrieval of information from free-text or document databases, signature methods have proven to be more efficient in terms of storage overhead and processing speed. Signature methods, however, present the problem of "false drops" in which a document is identified but does not satisfy the user query. In the signature approaches such as Word Signature, and Superimposed Coding, the number of false drops is directly related to the hashing function selected, signature size, and number of signature buffers used for each document. Hashing functions also generate collisions, which will result in false drops. In addition, these signature methods do not take into account the length of the words or the positional information of the characters that constitute the word. The use of "Don't Care Characters" in the queries, therefore, is not possible. This paper presents a new signature approach in which the sizes of the signature files are dependent on the number of unique symbols in the alphabet, and therefore for all documents containing English text, the size is constant. The signature generated in this technique maintains the positional information of characters and therefore allows for Don't Care Characters to be used in the queries. Implementation results and comparison of this technique to the Superimposed Coding method is presented.
引用
收藏
页码:153 / 163
页数:11
相关论文
共 50 条
  • [1] MENU-DRIVEN RETRIEVAL IN FREE-TEXT DATABASES
    KAISER, D
    ONLINE & CDROM REVIEW, 1993, 17 (02): : 123 - 123
  • [2] HYPERMEDIA AND FREE-TEXT RETRIEVAL
    DUNLOP, MD
    VANRIJSBERGEN, CJ
    INFORMATION PROCESSING & MANAGEMENT, 1993, 29 (03) : 287 - 298
  • [3] Information extraction from free-text business documents
    Abramowicz, W
    Piskorski, J
    ISSUES AND TRENDS OF INFORMATION TECHNOLOGY MANAGEMENT IN CONTEMPORARY ORGANIZATIONS, VOLS 1 AND 2, 2002, : 626 - 630
  • [4] DEVELOPMENTS IN FREE-TEXT RETRIEVAL-SYSTEMS
    MALLINSON, P
    JOURNAL OF THE SOCIETY OF ARCHIVISTS, 1993, 14 (01): : 55 - 64
  • [5] The phrase-based vector space model for automatic retrieval of free-text medical documents
    Mao, Wenlei
    Chu, Wesley W.
    DATA & KNOWLEDGE ENGINEERING, 2007, 61 (01) : 76 - 92
  • [6] Application-embedded retrieval from distributed free-text collections
    Kulyukin, Vladimir A.
    Proceedings of the National Conference on Artificial Intelligence, 1999, : 447 - 452
  • [7] Application-embedded retrieval from distributed free-text collections
    Kulyukin, VA
    SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), 1999, : 447 - 452
  • [8] FREE-TEXT SEARCHING IN FULL-TEXT DATABASES - PROBING SYSTEM LIMITS
    SORMUNEN, E
    ONLINE & CDROM REVIEW, 1994, 18 (02): : 117 - 117
  • [9] Identifying Patients with Depression Using Free-text Clinical Documents
    Zhou, Li
    Baughman, Amy W.
    Lei, Victor J.
    Lai, Kenneth H.
    Navathe, Amol S.
    Chang, Frank
    Sordo, Margarita
    Topaz, Maxim
    Zhong, Feiran
    Murrali, Madhavan
    Navathe, Shamkant
    Rocha, Roberto A.
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 629 - 633
  • [10] INFORMATION RETRIEVAL IN CLINICAL FREE TEXT DOCUMENTS
    Spat, S.
    Cadonna, B.
    Rakovac, I
    Guetl, C.
    Leitner, H.
    Stark, G.
    Beck, P.
    EHEALTH2008 - MEDICAL INFORMATICS MEETS EHEALTH, 2008, : 205 - 210