Measuring the validity of peer-to-peer data for information retrieval applications

被引:2
|
作者
Koenigstein, Noam
Shavitt, Yuval
Weinsberg, Ela [2 ]
Weinsberg, Udi [1 ]
机构
[1] Tel Aviv Univ, Sch Elect Engn, Tel Aviv, Israel
[2] Tel Aviv Univ, Dept Ind Engn, Tel Aviv, Israel
关键词
Peer-to-peer; Information retrieval; Measurement;
D O I
10.1016/j.comnet.2011.10.026
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Peer-to-peer (p2p) networks are being increasingly adopted as an invaluable resource for various information retrieval (IR) applications, including similarity estimation, content recommendation and trend prediction. However, these networks are usually extremely large and noisy, which raises doubts regarding the ability to actually extract sufficiently accurate information. This paper quantifies the measurement effort required to obtain and optimize the information obtained from p2p networks for the purpose of IR applications. We identify and measure inherent difficulties in collecting p2p data, namely, partial crawling, user-generated noise, sparseness, and popularity and localization of content and search queries. These aspects are quantified using music files shared in the Gnutella p2p network. We show that the power-law nature of the network makes it relatively easy to capture an accurate view of the popular content using relatively little effort. However, some applications, like trend prediction, mandate collection of the data from the "long tail", hence a much more exhaustive crawl is needed. Furthermore, we show that content and search queries are highly localized, indicating that location-crossing conclusions require a wide spread spatial crawl. Finally, we present techniques for overcoming noise originating from user generated content and for filtering non-informative data, while minimizing information loss. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1092 / 1102
页数:11
相关论文
共 50 条
  • [21] A Peer-to-Peer Social Network Overlay for Efficient Information Retrieval and Diffusion
    Ktari, Salma
    Hecker, Artur
    FUTURE INFORMATION TECHNOLOGY, PT II, 2011, 185 : 24 - 33
  • [22] Peer-to-peer information retrieval using shared-content clustering
    Ben-Gal, Irad
    Shavitt, Yuval
    Weinsberg, Ela
    Weinsberg, Udi
    KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 39 (02) : 383 - 408
  • [23] A peer-to-peer architecture for information retrieval across digital library collections
    Podnar, Ivana
    Luu, Toan
    Rajman, Martin
    Klemm, Fabius
    Aberer, Karl
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2006, 4172 : 14 - 25
  • [24] A distributed ranking strategy in peer-to-peer based information retrieval systems
    Lu, Zhiguo
    Ling, Bo
    Qian, Weining
    Ng, Wee Siong
    Zhou, Aoying
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2004, 3007 : 279 - 284
  • [25] A study of a weighting scheme for information retrieval in hierarchical peer-to-peer networks
    Melucci, Massimo
    Poggiani, Alberto
    ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 136 - +
  • [26] A distributed ranking strategy in peer-to-peer based information retrieval systems
    Lu, ZG
    Ling, B
    Qian, WN
    Ng, WS
    Zhou, AY
    ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 279 - 284
  • [27] Optimal configurations for peer-to-peer user-private information retrieval
    Stokes, Klara
    Bras-Amoros, Maria
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2010, 59 (04) : 1568 - 1577
  • [28] A peer-to-peer information retrieval system based on semantic similarity model
    Zhu, Kun-Peng
    Xu, Zhi-Ming
    Wang, Xiao-Long
    Zhao, Yu-Ming
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 4038 - 4043
  • [29] Peer-to-peer information retrieval using shared-content clustering
    Irad Ben-Gal
    Yuval Shavitt
    Ela Weinsberg
    Udi Weinsberg
    Knowledge and Information Systems, 2014, 39 : 383 - 408
  • [30] An evaluation of a cluster-based architecture for peer-to-peer information retrieval
    Klampanos, Iraklis A.
    Jose, Joemon M.
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 380 - +