Uniform random sampling not recommended for large graph size estimation

被引:6
|
作者
Lu, Jianguo [1 ]
Wang, Hao [1 ]
机构
[1] Univ Windsor, Sch Comp Sci, Windsor, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
NETWORKS;
D O I
10.1016/j.ins.2017.08.030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The norm of data size estimation is to use uniform random samples whenever possible. There have been tremendous efforts in obtaining uniform random samples using methods such as Metropolis-Hasting random walk or importance sampling [2]. This paper shows that, on the contrary to the common practice, uniform random sampling should be avoided when PPS (probability proportional to size) sampling is available for large data. To develop intuition of the sampling process, we discuss the sampling and estimation problem in the context of graph. The size is the number of nodes in the graph; uniform random sampling corresponds to uniform random node (RN) sampling; and PPS sampling is approximated by random edge (RE) sampling. In this setting, we show that for large graphs RE sampling outperforms RN sampling with a ratio proportional to the normalized graph degree variance. This result is particularly important in the era of big data, when data are typically large and scale-free [3], resulting in large degree variance. We derive the result by giving the variances of RN and RE estimators. Each step of the derivation is supported and demonstrated by simulation studies assuming power law distributions. Then we use 18 real-world networks to verify the result. Furthermore, we show that the performance of random walk (RW) sampling is data dependent and can be significantly worse than RN and RE. More specifically, RW can estimate online social networks but not Web graphs due to the difference of the graph conductance. Crown Copyright (C) 2017 Published by Elsevier Inc. All rights reserved.
引用
收藏
页码:136 / 153
页数:18
相关论文
共 50 条
  • [1] Uniform Random Sampling Not Recommended
    Lu, Jianguo
    Wang, Hao
    Li, Dingding
    COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 495 - 499
  • [2] GUISE: Uniform Sampling of Graphlets for Large Graph Analysis
    Bhuiyan, Mansurul A.
    Rahman, Mahmudur
    Rahman, Mahmuda
    Al Hasan, Mohammad
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 91 - 100
  • [3] Accelerating Graph Mining Algorithms via Uniform Random Edge Sampling
    Ciao, Ruohan
    Xu, Huanle
    Hu, Pili
    Lau, Wing Cheong
    2016 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2016,
  • [4] A large deviation principle for the Erdos-Renyi uniform random graph
    Dembo, Amir
    Lubetzky, Eyal
    ELECTRONIC COMMUNICATIONS IN PROBABILITY, 2018, 23
  • [5] On Random Sampling in Uniform Hypergraphs
    Czygrinow, Andrzej
    Nagle, Brendan
    RANDOM STRUCTURES & ALGORITHMS, 2011, 38 (04) : 422 - 440
  • [6] Snowball sampling for estimating exponential random graph models for large networks
    Stivala, Alex D.
    Koskinen, Johan H.
    Rolls, David A.
    Wang, Peng
    Robins, Garry L.
    SOCIAL NETWORKS, 2016, 47 : 167 - 188
  • [7] Random Sampling Method of Large-Scale Graph Data Classification
    Mustafa, Rashed
    Mahmud, Mohammad Sultan
    Shadid, Mahir
    JURNAL KEJURUTERAAN, 2024, 36 (02): : 525 - 532
  • [8] Can Quantum Computing Improve Uniform Random Sampling of Large Configuration Spaces?
    Ammermann, Joshua
    Bittner, Tim
    Eichhorn, Domenik
    Schaefer, Ina
    Seidl, Christoph
    2023 IEEE/ACM 4TH INTERNATIONAL WORKSHOP ON QUANTUM SOFTWARE ENGINEERING, Q-SE, 2023, : 34 - 41
  • [9] Estimation of Exponential Random Graph Models for Large Social Networks via Graph Limits
    He, Ran
    Zheng, Tian
    2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2013, : 254 - 261
  • [10] Conditional estimation of exponential random graph models from snowball sampling designs
    Pattison, Philippa E.
    Robins, Garry L.
    Snijders, Tom A. B.
    Wang, Peng
    JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2013, 57 (06) : 284 - 296