Inverted files versus signature files for text indexing

被引:177
|
作者
Zobel, J
Moffat, A
Ramamohanarao, K
机构
[1] RMIT Univ, Dept Comp Sci, Melbourne, Vic 3001, Australia
[2] Univ Melbourne, Dept Comp Sci, Parkville, Vic 3052, Australia
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 1998年 / 23卷 / 04期
关键词
indexing; inverted files; performance; signature files; text databases; text indexing;
D O I
10.1145/296854.277632
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Two well-known indexing methods are inverted files and signature files. We have undertaken a detailed comparison of these two approaches in the context of text indexing, paying particular attention to query evaluation speed and space requirements. We have examined their relative performance using both experimentation and a refined approach to modeling of signature files, and demonstrate that inverted files are distinctly superior to signature files. Not only can inverted files be used to evaluate typical queries in less time than can signature files, but inverted files require less space and provide greater functionality. Our results also show that a synthetic text database can provide a realistic indication of the behavior of an actual text database. The tools used to generate the synthetic database have been made publicly available.
引用
收藏
页码:453 / 490
页数:38
相关论文
共 50 条
  • [1] Self-indexing inverted files for fast text retrieval
    Moffat, A
    Zobel, J
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1996, 14 (04) : 349 - 379
  • [2] COMPARISON OF SIGNATURE AND INVERTED FILES
    NELSON, MJ
    CANADIAN JOURNAL OF INFORMATION SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION, 1988, 13 (3-4): : 79 - 89
  • [3] Comparing inverted files and signature files for searching a large lexicon
    Carterette, B
    Can, F
    INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (03) : 613 - 633
  • [4] Improved self-indexing inverted files for full-text retrieval
    College of Compute Science, South-Central University for Nationalities, Wuhan 430074, China
    不详
    J. Comput. Inf. Syst., 2009, 2 (1017-1024):
  • [5] Using inverted files to compress text
    Ristov, Strahil
    Journal of Computing and Information Technology, 2002, 10 (03) : 157 - 161
  • [6] Using inverted files to compress text
    Ristov, S
    ITI 2002: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2002, : 443 - 447
  • [7] Inverted files for text search engines
    Zobel, Justin
    Moffat, Alistair
    ACM COMPUTING SURVEYS, 2006, 38 (02)
  • [8] COMPLETE INVERTED FILES FOR EFFICIENT TEXT RETRIEVAL AND ANALYSIS
    BLUMER, A
    BLUMER, J
    HAUSSLER, D
    MCCONNELL, R
    EHRENFEUCHT, A
    JOURNAL OF THE ACM, 1987, 34 (03) : 578 - 595
  • [9] Optimistic concurrency control for inverted files in text databases
    Marín, M
    Proceedings of the IASTED International Conference on Databases and Applications, 2004, : 31 - 36
  • [10] Parallel generation of inverted files for distributed text collections
    Ribeiro-Neto, BA
    Kitajima, JP
    Navarro, G
    Ana, CRGS
    Ziviani, N
    SCCC'98 - XVIII INTERNATIONAL CONFERENCE OF THE CHILEAN SOCIETY OF COMPUTER SCIENCE, PROCEEDINGS, 1998, : 149 - 157