Parallel generation of inverted files for distributed text collections

被引：6

作者：

Ribeiro-Neto, BA ^{[1
]}

Kitajima, JP ^{[1
]}

Navarro, G ^{[1
]}

Ana, CRGS ^{[1
]}

Ziviani, N ^{[1
]}

机构：

[1] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil

来源：

SCCC'98 - XVIII INTERNATIONAL CONFERENCE OF THE CHILEAN SOCIETY OF COMPUTER SCIENCE, PROCEEDINGS | 1998年

关键词：

D O I：

10.1109/SCCC.1998.730794

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a scalable algorithm for the parallel computation of inverted files for large text collections. The algorithm takes into account an environment of a high bandwidth network of workstations with a shared-nothing memory organization. The text collection is assumed to be evenly distributed among the disks of the various workstations. Compression is used to save space in main memory (where inverted lists are kept) and to save time when data have to be moved across the network. The algorithm average running cost is O(t/p) where t is the size of the whole text collection and p is the number of available processors. We implemented our algorithm and drew experimental results. In a 100 Mbits/s switched Ethernet network with 4 PentiumPro 200 megahertz, 128 megabytes RAM on each processor we were able to invert 2 gigabytes of TREC documents in IS minutes. Further we also proposed an analytical model for the algorithm execution time.

引用

页码：149 / 157

页数：5

共 50 条

[21] Compressing inverted files
Trotman, A
INFORMATION RETRIEVAL, 2003, 6 (01): : 5 - 19
[22] Faster exact histogram intersection on large data collections using inverted VA-Files
Müller, W
Henrich, A
IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2004, 3115 : 455 - 463
[23] Compressing Inverted Files
Andrew Trotman
Information Retrieval, 2003, 6 : 5 - 19
[24] ORGANIZATION OF THE INVERTED FILES IN A DISTRIBUTED INFORMATION-RETRIEVAL SYSTEM BASED ON THESAURI
MAZUR, Z
INFORMATION PROCESSING & MANAGEMENT, 1986, 22 (03) : 243 - 250
[25] COMPARISON OF SIGNATURE AND INVERTED FILES
NELSON, MJ
CANADIAN JOURNAL OF INFORMATION SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION, 1988, 13 (3-4): : 79 - 89
[26] COMPUTERIZED SEARCHING OF INVERTED FILES
LYTLE, FE
ANALYTICAL CHEMISTRY, 1970, 42 (03) : 355 - &
[27] A PARALLEL AND NONBLOCKING UPDATING MECHANISM FOR REPLICATED DIRECTORY FILES IN DISTRIBUTED SYSTEMS
JIA, XH
SHIMIZU, K
MAEKAWA, M
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 20 (03) : 330 - 340
[28] UNIFORM ORGANIZATION OF INVERTED FILES
MOTZKIN, D
WILLIAMS, K
CHANG, K
AFIPS CONFERENCE PROCEEDINGS, 1984, 53 : 567 - +
[29] RETRIEVAL OPTIMIZATION IN INVERTED FILES
ELIGULASHVILI, BG
PROGRAMMING AND COMPUTER SOFTWARE, 1987, 13 (06) : 268 - 271
[30] OPTIMAL PERFORMANCE OF INVERTED FILES
HOFFER, JA
KOVACEVIC, A
OPERATIONS RESEARCH, 1982, 30 (02) : 336 - 354

← 1 2 3 4 5 →