IMPLEMENTATIONS OF PARTIAL DOCUMENT RANKING USING INVERTED FILES

被引：15

作者：

WONG, WYP

LEE, DL

机构：

[1] Department of Computer and Information Science, Ohio State University, Columbus, OH 43210

来源：

INFORMATION PROCESSING & MANAGEMENT | 1993年 / 29卷 / 05期

关键词：

D O I：

10.1016/0306-4573(93)90085-R

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Most commercial text retrieval systems employ inverted files to improve retrieval speed. This paper concerns with the implementations of document ranking based on inverted files. Three heuristic methods for implementing the tf x idf weighting strategy, where tf stands for term frequency and idf stands for inverse document frequency, are studied. The basic idea of the heuristic methods is to process the query terms in an order so that as many top documents as possible can be identified without processing all of the query terms. The first heuristic was proposed by Smeaton and van Rijsbergen and it serves as the basis for comparison with the other two heuristic methods proposed in this paper. These three heuristics are evaluated and compared by experimental runs based on the number of disk accesses required for partial document ranking, in which the returned documents contain some, but not necessarily all, of the requested number of top documents. The results show that the proposed heuristic methods perform better than the method proposed by Smeaton and van Rijsbergen in terms of retrieval accuracy, which is used to indicate the percentage of top documents obtained after a number of disk accesses. For total document ranking, in which all of the requested number of top documents are guaranteed to be returned, no optimization techniques studied so far can lead to substantial performance gain. To realize the advantage of the proposed heuristics, two methods for estimating the retrieval accuracy are studied. Their accuracies and processing costs are compared. All the experimental runs are based on four test collections made available with the SMART system.

引用

页码：647 / 669

页数：23

共 50 条

[31] Ranking Document Clusters Using Markov Random Fields
Raiber, Fiana
Kurland, Oren
SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, 2013, : 333 - 342
[32] Random access compressed inverted files
Anh, VN
Moffat, A
PROCEEDINGS OF THE 9TH AUSTRALASIAN DATABASE CONFERENCE, ADC'98, 1998, 20 (02): : 3 - 14
[33] ORGANIZATION OF INVERTED SEARCH IN DYNAMIC FILES
KOCHIN, YY
AUTOMATION AND REMOTE CONTROL, 1980, 41 (05) : 723 - 727
[34] A prefix trie index for inverted files
Nelson, MJ
INFORMATION PROCESSING & MANAGEMENT, 1997, 33 (06) : 739 - 744
[35] Static pruning of terms in inverted files
Blanco, Roi
Barreiro, Alvaro
ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 64 - +
[36] Inverted files for text search engines
Zobel, Justin
Moffat, Alistair
ACM COMPUTING SURVEYS, 2006, 38 (02)
[37] SHARED VS SEPARATE INVERTED FILES
GRAZZINI, E
PIPPOLINI, F
LECTURE NOTES IN COMPUTER SCIENCE, 1989, 367 : 517 - 531
[38] Probabilistic Static Pruning of Inverted Files
Blanco, Roi
Barreiro, Alvaro
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2010, 28 (01)
[39] Adaptive Ranking Relevant Source Files for Bug Reports Using Genetic Algorithm
Thi Mai Anh Bui
Nhat Hai Nguyen
NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2021, 337 : 430 - 443
[40] MDMP: A New algorithm to create inverted index files in BigData, using MapReduce
Arab, Ahmad
Abrishami, Saeid
PROCEEDINGS OF THE 2017 7TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2017, : 372 - 378

← 1 2 3 4 5 →