Measuring the validity of peer-to-peer data for information retrieval applications

被引:2
|
作者
Koenigstein, Noam
Shavitt, Yuval
Weinsberg, Ela [2 ]
Weinsberg, Udi [1 ]
机构
[1] Tel Aviv Univ, Sch Elect Engn, Tel Aviv, Israel
[2] Tel Aviv Univ, Dept Ind Engn, Tel Aviv, Israel
关键词
Peer-to-peer; Information retrieval; Measurement;
D O I
10.1016/j.comnet.2011.10.026
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Peer-to-peer (p2p) networks are being increasingly adopted as an invaluable resource for various information retrieval (IR) applications, including similarity estimation, content recommendation and trend prediction. However, these networks are usually extremely large and noisy, which raises doubts regarding the ability to actually extract sufficiently accurate information. This paper quantifies the measurement effort required to obtain and optimize the information obtained from p2p networks for the purpose of IR applications. We identify and measure inherent difficulties in collecting p2p data, namely, partial crawling, user-generated noise, sparseness, and popularity and localization of content and search queries. These aspects are quantified using music files shared in the Gnutella p2p network. We show that the power-law nature of the network makes it relatively easy to capture an accurate view of the popular content using relatively little effort. However, some applications, like trend prediction, mandate collection of the data from the "long tail", hence a much more exhaustive crawl is needed. Furthermore, we show that content and search queries are highly localized, indicating that location-crossing conclusions require a wide spread spatial crawl. Finally, we present techniques for overcoming noise originating from user generated content and for filtering non-informative data, while minimizing information loss. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1092 / 1102
页数:11
相关论文
共 50 条
  • [1] Peer-to-Peer Information Retrieval: An Overview
    Tigelaar, Almer S.
    Hiemstra, Djoerd
    Trieschnigg, Dolf
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2012, 30 (02)
  • [2] Peer-to-Peer Private Information Retrieval
    Domingo-Ferrer, Josep
    Bras-Amoros, Maria
    PRIVACY IN STATISTICAL DATABASES, PROCEEDINGS, 2008, 5262 : 315 - 323
  • [3] Information retrieval techniques for peer-to-peer networks
    Zeinalipour-Yazti, D
    Kalogeraki, V
    Gunopulos, D
    COMPUTING IN SCIENCE & ENGINEERING, 2004, 6 (04) : 20 - 26
  • [4] Multidimensional information retrieval in peer-to-peer networks
    Tran, Duc A.
    Nguyen, K.
    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 2505 - 2509
  • [5] On the Feasibility of Unstructured Peer-to-Peer Information Retrieval
    Asthana, H.
    Fu, Ruoxun
    Cox, Ingemar J.
    ADVANCES IN INFORMATION RETRIEVAL THEORY, 2011, 6931 : 125 - 138
  • [6] Semantic Information Retrieval on Peer-to-Peer Networks
    Erturk, Mehmet Ali
    Zaim, A. Halim
    Akyokus, Selim
    NETWORKED DIGITAL TECHNOLOGIES, PT 2, 2010, 88 : 715 - +
  • [7] Supporting information retrieval in peer-to-peer systems
    Balke, WT
    PEER-TO-PEER SYSTEMS AND APPLICATIONS, 2005, 3485 : 337 - 352
  • [8] A weighing framework for information retrieval in peer-to-peer networks
    Castiglion, R
    Melucci, M
    Sixteenth International Workshop on Database and Expert Systems Applications, Proceedings, 2005, : 374 - 378
  • [9] The design of PIRS, a peer-to-peer information retrieval system
    Yee, WG
    Frieder, O
    DATABASES, INFORMATION SYSTEMS, AND PEER-TO-PEER COMPUTING, 2005, 3367 : 107 - 121
  • [10] A scalable peer-to-peer system for music information retrieval
    Tzanetakis, G
    Gao, J
    Steenkiste, P
    COMPUTER MUSIC JOURNAL, 2004, 28 (02) : 24 - 33