IMPLEMENTATIONS OF PARTIAL DOCUMENT RANKING USING INVERTED FILES

被引:15
|
作者
WONG, WYP
LEE, DL
机构
[1] Department of Computer and Information Science, Ohio State University, Columbus, OH 43210
关键词
D O I
10.1016/0306-4573(93)90085-R
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most commercial text retrieval systems employ inverted files to improve retrieval speed. This paper concerns with the implementations of document ranking based on inverted files. Three heuristic methods for implementing the tf x idf weighting strategy, where tf stands for term frequency and idf stands for inverse document frequency, are studied. The basic idea of the heuristic methods is to process the query terms in an order so that as many top documents as possible can be identified without processing all of the query terms. The first heuristic was proposed by Smeaton and van Rijsbergen and it serves as the basis for comparison with the other two heuristic methods proposed in this paper. These three heuristics are evaluated and compared by experimental runs based on the number of disk accesses required for partial document ranking, in which the returned documents contain some, but not necessarily all, of the requested number of top documents. The results show that the proposed heuristic methods perform better than the method proposed by Smeaton and van Rijsbergen in terms of retrieval accuracy, which is used to indicate the percentage of top documents obtained after a number of disk accesses. For total document ranking, in which all of the requested number of top documents are guaranteed to be returned, no optimization techniques studied so far can lead to substantial performance gain. To realize the advantage of the proposed heuristics, two methods for estimating the retrieval accuracy are studied. Their accuracies and processing costs are compared. All the experimental runs are based on four test collections made available with the SMART system.
引用
收藏
页码:647 / 669
页数:23
相关论文
共 50 条
  • [1] Electronic Document Management Using Inverted Files System
    Suhartono, Derwin
    Setiawan, Erwin
    Irwanto, Djon
    ICASCE 2013 - INTERNATIONAL CONFERENCE ON ADVANCES SCIENCE AND CONTEMPORARY ENGINEERING, 2014, 68
  • [2] Compression of boolean inverted files by document ordering
    Gelbukh, A
    Han, SY
    Sidorov, G
    2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 244 - 249
  • [3] Load balancing distributed inverted files: Query ranking
    Gomez-Pantoja, Carlos
    Marin, Mauricio
    PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008, : 329 - 333
  • [4] Document ranking on weight-partitioned signature files
    Lee, DL
    Ren, LM
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1996, 14 (02) : 109 - 137
  • [5] Document Re-ranking using Partial Social Tagging
    Li, Peng
    Nie, Jian-Yun
    Wang, Bin
    He, Jing
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 274 - 281
  • [6] PARTIAL DOCUMENT RANKING BY HEURISTIC METHODS
    LEE, DL
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 497 : 231 - 239
  • [7] REDUCING BLOCK ACCESSES IN INVERTED FILES BY PARTIAL CLUSTERING
    JAKOBSSON, M
    INFORMATION SYSTEMS, 1980, 5 (01) : 1 - 5
  • [8] Performance of query processing implementations in ranking-based text retrieval systems using inverted indices
    Cambazoglu, BB
    Aykanat, C
    INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (04) : 875 - 898
  • [9] Using inverted files to compress text
    Ristov, Strahil
    Journal of Computing and Information Technology, 2002, 10 (03) : 157 - 161
  • [10] Using inverted files to compress text
    Ristov, S
    ITI 2002: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2002, : 443 - 447