A case study of distributed information retrieval architectures to index one terabyte of text

被引:16
|
作者
Cacheda, F
Plachouras, V
Ounis, I
机构
[1] Univ A Coruna, Fac Informat, Dept Informat & Commun Technol, La Coruna 15071, Spain
[2] Univ Glasgow, Dept Comp Sci, Glasgow G12 8QQ, Lanark, Scotland
基金
英国工程与自然科学研究理事会;
关键词
distributed information retrieval; performance; simulation;
D O I
10.1016/j.ipm.2004.05.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The increasing number of documents to be indexed in many environments (Web, intranets, digital libraries) and the limitations of a single centralised index (lack of scalability, server overloading and failures), lead to the use of distributed information retrieval systems to efficiently search and locate the desired information. This work is a case study of different architectures for a distributed information retrieval system, in order to provide a guide to approximate the optimal architecture with a specific set of resources. We analyse the effectiveness of a distributed, replicated and clustered architecture simulating a variable number of workstations (from I up to 4096). A collection of approximately 94 million documents and I terabyte (TB) of text is used to test the performance of the different architectures. In a purely distributed information retrieval system, the brokers become the bottleneck due to the high number of local answer sets to be sorted. In a replicated system, the network is the bottleneck due to the high number of query servers and the continuous data interchange with the brokers. Finally, we demonstrate that a clustered system will outperform a replicated system if a high number of query servers is used, essentially due to the reduction of the network load. However a change in the distribution of the users' queries could reduce the performance of a clustered system. (c) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1141 / 1161
页数:21
相关论文
共 50 条
  • [31] Information Extraction with Active Learning: A Case Study in Legal Text
    Cardellino, Cristian
    Villata, Serena
    Alonso Alemany, Laura
    Cabrio, Elena
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 483 - 494
  • [32] The information resources utilization index: a case study in China
    Borjigin, Chaolemen
    Feng, Huiling
    Zhang, Bin
    Zhao, Guojun
    PROGRAM-ELECTRONIC LIBRARY AND INFORMATION SYSTEMS, 2016, 50 (01) : 2 - 15
  • [33] Building adaptive information retrieval systems for organizational memories: a case study
    Fortier, JY
    Kassel, G
    PROCEEDINGS OF THE ISCA 12TH INTERNATIONAL CONFERENCE INTELLIGENT AND ADAPTIVE SYSTEMS AND SOFTWARE ENGINEERING, 2003, : 50 - 53
  • [34] Requests for information from a film archive: a case study of multimedia retrieval
    Hertzum, M
    JOURNAL OF DOCUMENTATION, 2003, 59 (02) : 168 - 186
  • [35] INFORMATION-RETRIEVAL EVALUATION IN PRACTICE - A CASE-STUDY APPROACH
    SMITHSON, S
    INFORMATION PROCESSING & MANAGEMENT, 1994, 30 (02) : 205 - 221
  • [36] Measurement of Incompatible Probability in Information Retrieval:A Case Study with User Clicks
    王博
    侯越先
    Transactions of Tianjin University, 2013, 19 (01) : 37 - 42
  • [37] Distributed Architectures for Intensive Urban Computing: A Case Study on Smart Lighting for Sustainable Cities
    Mora, Higinio
    Peral, Jesus
    Ferrandez, Antonio
    Gil, David
    Szymanski, Julian
    IEEE ACCESS, 2019, 7 : 58449 - 58465
  • [38] Case study: Visualization and information retrieval techniques for network intrusion detection
    Atkison, T
    Pensy, K
    Nicholas, C
    Ebert, D
    Atkison, R
    Morris, C
    DATA VISUALIZATION 2001, 2001, : 283 - +
  • [39] Information retrieval in systematic reviews: a case study of the crime prevention literature
    Lisa Tompson
    Jyoti Belur
    Journal of Experimental Criminology, 2016, 12 : 187 - 207
  • [40] Applying information-retrieval methods to software reuse: a case study
    Stierna, EJ
    Rowe, NC
    INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) : 67 - 74