A case study of distributed information retrieval architectures to index one terabyte of text

被引:16
|
作者
Cacheda, F
Plachouras, V
Ounis, I
机构
[1] Univ A Coruna, Fac Informat, Dept Informat & Commun Technol, La Coruna 15071, Spain
[2] Univ Glasgow, Dept Comp Sci, Glasgow G12 8QQ, Lanark, Scotland
基金
英国工程与自然科学研究理事会;
关键词
distributed information retrieval; performance; simulation;
D O I
10.1016/j.ipm.2004.05.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The increasing number of documents to be indexed in many environments (Web, intranets, digital libraries) and the limitations of a single centralised index (lack of scalability, server overloading and failures), lead to the use of distributed information retrieval systems to efficiently search and locate the desired information. This work is a case study of different architectures for a distributed information retrieval system, in order to provide a guide to approximate the optimal architecture with a specific set of resources. We analyse the effectiveness of a distributed, replicated and clustered architecture simulating a variable number of workstations (from I up to 4096). A collection of approximately 94 million documents and I terabyte (TB) of text is used to test the performance of the different architectures. In a purely distributed information retrieval system, the brokers become the bottleneck due to the high number of local answer sets to be sorted. In a replicated system, the network is the bottleneck due to the high number of query servers and the continuous data interchange with the brokers. Finally, we demonstrate that a clustered system will outperform a replicated system if a high number of query servers is used, essentially due to the reduction of the network load. However a change in the distribution of the users' queries could reduce the performance of a clustered system. (c) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1141 / 1161
页数:21
相关论文
共 50 条
  • [21] A Text Mining-Based Approach for Analyzing Information Retrieval in Spanish: Music Data Collection as a Case Study
    Ramos-Gonzalez, Juan
    Martin-Gomez, Lucia
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, 801 : 259 - 266
  • [22] Information social networks: one history and one case study
    Ferreira, Goncalo Costa
    PERSPECTIVAS EM CIENCIA DA INFORMACAO, 2011, 16 (03): : 208 - 231
  • [23] A multidimensional approach to the study of human-information interaction: A case study of collaborative information retrieval
    Fidel, R
    Pejtersen, AM
    Cleal, B
    Bruce, H
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (11): : 939 - 953
  • [24] A case study on freshness based scoring for fresh information retrieval
    Sato, N
    Uehara, M
    Sakai, Y
    IEEE INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2004 (ISCIT 2004), PROCEEDINGS, VOLS 1 AND 2: SMART INFO-MEDIA SYSTEMS, 2004, : 210 - 215
  • [25] A case study of a reusable component collection in the information retrieval domain
    Frakes, WB
    JOURNAL OF SYSTEMS AND SOFTWARE, 2004, 72 (02) : 265 - 270
  • [26] Multilingual information retrieval on the Internet: A case study of Turkish users
    Aytac, S
    INTERNATIONAL INFORMATION & LIBRARY REVIEW, 2005, 37 (04) : 275 - 284
  • [27] Thesaurus Performance with Information Retrieval: Schema Matching as A Case Study
    Sabbah, Thabit
    Selamat, Ali
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 4494 - 4498
  • [28] A library's information retrieval system (In) effectiveness: case study
    Marijan, Robert
    Leskovar, Robert
    LIBRARY HI TECH, 2015, 33 (03) : 369 - 386
  • [30] A study on the adaptive information retrieval framework with spatial and temporal information — A case of catering information service
    Wu, Jyun-Lin
    Yu, Ting-Jung
    Fu, Chen-Hua
    Journal of Technology, 2021, 36 (01): : 37 - 53