Uniform random sampling not recommended for large graph size estimation

被引:6
|
作者
Lu, Jianguo [1 ]
Wang, Hao [1 ]
机构
[1] Univ Windsor, Sch Comp Sci, Windsor, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
NETWORKS;
D O I
10.1016/j.ins.2017.08.030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The norm of data size estimation is to use uniform random samples whenever possible. There have been tremendous efforts in obtaining uniform random samples using methods such as Metropolis-Hasting random walk or importance sampling [2]. This paper shows that, on the contrary to the common practice, uniform random sampling should be avoided when PPS (probability proportional to size) sampling is available for large data. To develop intuition of the sampling process, we discuss the sampling and estimation problem in the context of graph. The size is the number of nodes in the graph; uniform random sampling corresponds to uniform random node (RN) sampling; and PPS sampling is approximated by random edge (RE) sampling. In this setting, we show that for large graphs RE sampling outperforms RN sampling with a ratio proportional to the normalized graph degree variance. This result is particularly important in the era of big data, when data are typically large and scale-free [3], resulting in large degree variance. We derive the result by giving the variances of RN and RE estimators. Each step of the derivation is supported and demonstrated by simulation studies assuming power law distributions. Then we use 18 real-world networks to verify the result. Furthermore, we show that the performance of random walk (RW) sampling is data dependent and can be significantly worse than RN and RE. More specifically, RW can estimate online social networks but not Web graphs due to the difference of the graph conductance. Crown Copyright (C) 2017 Published by Elsevier Inc. All rights reserved.
引用
收藏
页码:136 / 153
页数:18
相关论文
共 50 条
  • [21] Global triangle estimation based on first edge sampling in large graph streams
    Changyong Yu
    Huimin Liu
    Fazal Wahab
    Zihan Ling
    Tianmei Ren
    Haitao Ma
    Yuhai Zhao
    The Journal of Supercomputing, 2023, 79 : 14079 - 14116
  • [22] ESTIMATION OF FREQUENCY BY RANDOM SAMPLING
    ISOKAWA, Y
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1983, 35 (02) : 201 - 213
  • [23] On Set Size Distribution Estimation and the Characterization of Large Networks via Sampling
    Murai, Fabricio
    Ribeiro, Bruno
    Towsley, Don
    Wang, Pinghui
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2013, 31 (06) : 1017 - 1025
  • [24] BURST: Benchmarking uniform random sampling techniques
    Acher, Mathieu
    Perrouin, Gilles
    Cordy, Maxime
    SCIENCE OF COMPUTER PROGRAMMING, 2023, 226
  • [25] AN ALGORITHM FOR UNIFORM RANDOM SAMPLING OF POINTS IN AND ON A HYPERSPHERE
    GURALNIK, G
    ZEMACH, C
    WARNOCK, T
    INFORMATION PROCESSING LETTERS, 1985, 21 (01) : 17 - 21
  • [26] Node copying: A random graph model for effective graph sampling
    Regol, Florence
    Pal, Soumyasundar
    Sun, Jianing
    Zhang, Yingxue
    Geng, Yanhui
    Coates, Mark
    SIGNAL PROCESSING, 2022, 192
  • [27] On the Theorem of Uniform Recovery of Random Sampling Matrices
    Andersson, Joel
    Stromberg, Jan-Olov
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2014, 60 (03) : 1700 - 1710
  • [28] Exponential random graph model parameter estimation for very large directed networks
    Stivala, Alex
    Robins, Garry
    Lomi, Alessandro
    PLOS ONE, 2020, 15 (01):
  • [29] GRAPH SAMPLING: ESTIMATION OF DEGREE DISTRIBUTIONS
    Deri, Joya A.
    Moura, Jose M. F.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6501 - 6505
  • [30] Supports estimation via graph sampling
    Wang, Xin
    Shi, Jun-Hao
    Zou, Jie-Jun
    Shen, Ling-Zhen
    Lan, Zhuo
    Fang, Yu
    Xie, Wen -Bo
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 240