Measuring the validity of peer-to-peer data for information retrieval applications

被引:2
|
作者
Koenigstein, Noam
Shavitt, Yuval
Weinsberg, Ela [2 ]
Weinsberg, Udi [1 ]
机构
[1] Tel Aviv Univ, Sch Elect Engn, Tel Aviv, Israel
[2] Tel Aviv Univ, Dept Ind Engn, Tel Aviv, Israel
关键词
Peer-to-peer; Information retrieval; Measurement;
D O I
10.1016/j.comnet.2011.10.026
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Peer-to-peer (p2p) networks are being increasingly adopted as an invaluable resource for various information retrieval (IR) applications, including similarity estimation, content recommendation and trend prediction. However, these networks are usually extremely large and noisy, which raises doubts regarding the ability to actually extract sufficiently accurate information. This paper quantifies the measurement effort required to obtain and optimize the information obtained from p2p networks for the purpose of IR applications. We identify and measure inherent difficulties in collecting p2p data, namely, partial crawling, user-generated noise, sparseness, and popularity and localization of content and search queries. These aspects are quantified using music files shared in the Gnutella p2p network. We show that the power-law nature of the network makes it relatively easy to capture an accurate view of the popular content using relatively little effort. However, some applications, like trend prediction, mandate collection of the data from the "long tail", hence a much more exhaustive crawl is needed. Furthermore, we show that content and search queries are highly localized, indicating that location-crossing conclusions require a wide spread spatial crawl. Finally, we present techniques for overcoming noise originating from user generated content and for filtering non-informative data, while minimizing information loss. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:1092 / 1102
页数:11
相关论文
共 50 条
  • [31] A suite of testbeds for the realistic evaluation of peer-to-peer information retrieval systems
    Klampanos, IA
    Poznanski, V
    Jose, JM
    Dickman, P
    ADVANCES IN INFORMATION RETRIEVAL, 2005, 3408 : 38 - 51
  • [32] An Evaluation of a Cluster-based Testbed for Peer-to-Peer Information Retrieval
    Zammali, Saloua
    Arour, Khedija
    PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON INTERNET AND WEB APPLICATIONS AND SERVICES (ICIW 2011), 2011, : 136 - 141
  • [33] DATA -DEPENDENCY FORMALISM FOR DEVELOPING PEER-TO-PEER APPLICATIONS
    Lahcen, Ayoub Ait
    Parigot, Didier
    Mouline, Salma
    COMPUTING AND INFORMATICS, 2017, 36 (02) : 353 - 385
  • [34] A definition of Peer-to-Peer networking for the classification of Peer-to-Peer architectures and applications
    Schollmeier, R
    FIRST INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING, 2002, : 101 - 102
  • [35] Personal information in peer-to-peer loan applications: Is less more?
    Prystav, Fabian
    JOURNAL OF BEHAVIORAL AND EXPERIMENTAL FINANCE, 2016, 9 : 6 - 19
  • [36] Reconfigurable Peer-to-Peer Connectivity Overlays for Information Assurance Applications
    Ravindran, K.
    2009 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-8, 2009, : 882 - 888
  • [37] A scalable peer-to-peer architecture for distributed information monitoring applications
    Gedik, B
    Liu, L
    IEEE TRANSACTIONS ON COMPUTERS, 2005, 54 (06) : 767 - 782
  • [38] Network information sharing system with peer-to-peer network applications
    Ueda, Kazunori
    Kimura, Norio
    2015 17TH ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM APNOMS, 2015, : 534 - 537
  • [39] Energy efficient data retrieval and caching in mobile peer-to-peer networks
    Joseph, MS
    Kumar, M
    Shen, HP
    Das, S
    Third IEEE International Conference on Pervasive Computing and Communications, Workshops, 2005, : 50 - 54
  • [40] Peer-to-peer for collaborative applications
    Cugola, G
    Picco, GP
    22ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOP, PROCEEDINGS, 2002, : 359 - 364