Uniform random sampling not recommended for large graph size estimation

被引:6
|
作者
Lu, Jianguo [1 ]
Wang, Hao [1 ]
机构
[1] Univ Windsor, Sch Comp Sci, Windsor, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
NETWORKS;
D O I
10.1016/j.ins.2017.08.030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The norm of data size estimation is to use uniform random samples whenever possible. There have been tremendous efforts in obtaining uniform random samples using methods such as Metropolis-Hasting random walk or importance sampling [2]. This paper shows that, on the contrary to the common practice, uniform random sampling should be avoided when PPS (probability proportional to size) sampling is available for large data. To develop intuition of the sampling process, we discuss the sampling and estimation problem in the context of graph. The size is the number of nodes in the graph; uniform random sampling corresponds to uniform random node (RN) sampling; and PPS sampling is approximated by random edge (RE) sampling. In this setting, we show that for large graphs RE sampling outperforms RN sampling with a ratio proportional to the normalized graph degree variance. This result is particularly important in the era of big data, when data are typically large and scale-free [3], resulting in large degree variance. We derive the result by giving the variances of RN and RE estimators. Each step of the derivation is supported and demonstrated by simulation studies assuming power law distributions. Then we use 18 real-world networks to verify the result. Furthermore, we show that the performance of random walk (RW) sampling is data dependent and can be significantly worse than RN and RE. More specifically, RW can estimate online social networks but not Web graphs due to the difference of the graph conductance. Crown Copyright (C) 2017 Published by Elsevier Inc. All rights reserved.
引用
收藏
页码:136 / 153
页数:18
相关论文
共 50 条
  • [41] Stratified random sampling for power estimation
    Ding, CS
    Wu, Q
    Hsieh, CT
    Pedram, M
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1998, 17 (06) : 465 - 471
  • [42] Estimation of entropy using random sampling
    Al-Omari, Amer Ibrahim
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2014, 261 : 95 - 102
  • [43] METHODS FOR POPULATION ESTIMATION BY RANDOM SAMPLING
    BUNDAY, BD
    POWDER TECHNOLOGY, 1975, 12 (03) : 283 - 286
  • [44] An improved estimation in stratified random sampling
    Solanki, Ramkrishna S.
    Singh, Housila P.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2016, 45 (07) : 2056 - 2070
  • [45] RANDOM SAMPLING AND PARAMETRIC-ESTIMATION
    DENIAU, C
    OPPENHEIM, G
    VIANO, MC
    COMPTES RENDUS DE L ACADEMIE DES SCIENCES SERIE I-MATHEMATIQUE, 1988, 306 (13): : 565 - 568
  • [46] Comparison of Uniform and Random Sampling for Speech and Music Signals
    Zarmehi, Nematollah
    Shahsavari, Sina
    Marvasti, Farokh
    2017 INTERNATIONAL CONFERENCE ON SAMPLING THEORY AND APPLICATIONS (SAMPTA), 2017, : 552 - 555
  • [47] BURST: A Benchmarking Platform for Uniform Random Sampling Techniques
    Acher, Mathieu
    Perrouin, Gilles
    Cordy, Maxime
    SPLC '21 - PROCEEDINGS OF THE 25TH ACM INTERNATIONAL SYSTEMS AND SOFTWARE PRODUCT LINE CONFERENCE, VOL B, 2021, : 36 - 40
  • [48] Uniform Random Sampling of Planar Graphs in Linear Time
    Fusy, Eric
    RANDOM STRUCTURES & ALGORITHMS, 2009, 35 (04) : 464 - 522
  • [49] Embedding large graphs into a random graph
    Ferber, Asaf
    Luh, Kyle
    Nguyen, Oanh
    BULLETIN OF THE LONDON MATHEMATICAL SOCIETY, 2017, 49 (05) : 784 - 797
  • [50] Efficient Estimation of Graph Signals With Adaptive Sampling
    Ahmadi, Mohammad Javad
    Arablouei, Reza
    Abdolee, Reza
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 3808 - 3823