Measuring the validity of peer-to-peer data for information retrieval applications

被引:2
|
作者
Koenigstein, Noam
Shavitt, Yuval
Weinsberg, Ela [2 ]
Weinsberg, Udi [1 ]
机构
[1] Tel Aviv Univ, Sch Elect Engn, Tel Aviv, Israel
[2] Tel Aviv Univ, Dept Ind Engn, Tel Aviv, Israel
关键词
Peer-to-peer; Information retrieval; Measurement;
D O I
10.1016/j.comnet.2011.10.026
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Peer-to-peer (p2p) networks are being increasingly adopted as an invaluable resource for various information retrieval (IR) applications, including similarity estimation, content recommendation and trend prediction. However, these networks are usually extremely large and noisy, which raises doubts regarding the ability to actually extract sufficiently accurate information. This paper quantifies the measurement effort required to obtain and optimize the information obtained from p2p networks for the purpose of IR applications. We identify and measure inherent difficulties in collecting p2p data, namely, partial crawling, user-generated noise, sparseness, and popularity and localization of content and search queries. These aspects are quantified using music files shared in the Gnutella p2p network. We show that the power-law nature of the network makes it relatively easy to capture an accurate view of the popular content using relatively little effort. However, some applications, like trend prediction, mandate collection of the data from the "long tail", hence a much more exhaustive crawl is needed. Furthermore, we show that content and search queries are highly localized, indicating that location-crossing conclusions require a wide spread spatial crawl. Finally, we present techniques for overcoming noise originating from user generated content and for filtering non-informative data, while minimizing information loss. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1092 / 1102
页数:11
相关论文
共 50 条
  • [41] Taxonomy of reputation assessment in peer-to-peer systems and analysis of their data retrieval
    Azzedin, Farag
    KNOWLEDGE ENGINEERING REVIEW, 2014, 29 (04): : 463 - 483
  • [42] Peer-to-peer communications and applications
    Naik, Kshirasagar
    Wei, David S. L.
    Kuo, Sy-Yen
    Hara, Takahiro
    Staab, Steffen
    Spatscheck, Oliver
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2007, 25 (01) : 1 - 4
  • [43] Peer-to-peer data structures for cooperative traffic information systems
    Rybicki, Jedrzej
    Scheuermann, Bjoern
    Mauve, Martin
    PERVASIVE AND MOBILE COMPUTING, 2012, 8 (02) : 194 - 209
  • [44] Design and implementation of agent community based peer-to-peer information retrieval method
    Mine, T
    Matsuno, D
    Kogo, A
    Amamiya, M
    COOPERATIVE INFORMATION AGENTS VIII, PROCEEDINGS, 2004, 3191 : 31 - 46
  • [45] Hybrid global-local indexing for efficient peer-to-peer information retrieval
    Tang, CQ
    Dwarkadas, S
    USENIX ASSOCIATION PROCEEDINGS OF THE FIRST SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION (NSDI'04), 2004, : 211 - 224
  • [46] Data-driven coordination in peer-to-peer information systems
    Busi, N
    Montresor, A
    Zavattaro, G
    INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2004, 13 (01) : 63 - 89
  • [47] Experimental Study on Semi-structured Peer-to-Peer Information Retrieval Network
    Alkhawaldeh, Rami S.
    Jose, Joemon M.
    EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, 2015, 9283 : 5 - 16
  • [48] A Profile-Based Aggregation Model in a Peer-To-Peer Information Retrieval System
    Mghirbi, Rim
    Arour, Khedija
    Slimani, Yahya
    Defude, Bruno
    DATA MANAGEMENT IN GRID AND PEER-TO-PEER SYSTEMS, 2010, 6265 : 148 - +
  • [49] Improving Information Retrieval Effectiveness in Peer-to-Peer Networks through Query Piggybacking
    Di Buccio, Emanuele
    Masiero, Ivano
    Melucci, Massimo
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, PROCEEDINGS, 2009, 5714 : 420 - 424
  • [50] Agent-community-based peer-to-peer information retrieval and its evaluation
    Faculty of Information Science and Electrical Engineering, Kyushu University, Kasuga, 816-8580, Japan
    不详
    Syst Comput Jpn, 2006, 13 (1-10):