Cluster-preserving sampling algorithm for large-scale graphs

被引:4
|
作者
Zhang, Jianpeng [1 ]
Chen, Hongchang [1 ]
Yu, Dingjiu [1 ,2 ]
Pei, Yulong [3 ]
Deng, Yingjun [4 ]
机构
[1] Informat Engn Univ, Natl Digital Switching Syst E&T Res Ctr, Zhengzhou 450001, Peoples R China
[2] Network Syst Dept Strateg Support Force, Beijing 100091, Peoples R China
[3] Eindhoven Univ Technol, Sch Comp Sci & Technol, NL-5612 AE Eindhoven, Netherlands
[4] Tianjin Univ, Ctr Appl Math, Tianjin 300072, Peoples R China
基金
中国博士后科学基金;
关键词
graph sampling; clustering structure; top-leader nodes; expansion strategies; large-scale graphs; NETWORKS;
D O I
10.1007/s11432-021-3370-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph sampling is a very effective method to deal with scalability issues when analyzing large-scale graphs. Lots of sampling algorithms have been proposed, and sampling qualities have been quantified using explicit properties (e.g., degree distribution) of the sample. However, the existing sampling techniques are inadequate for the current sampling task: sampling the clustering structure, which is a crucial property of the current networks. In this paper, using different expansion strategies, two novel top-leader sampling methods (i.e., TLS-e and TLS-i) are proposed to obtain representative samples, and they are capable of effectively preserving the clustering structure. The rationale behind them is to select top-leader nodes of most clusters into the sample and then heuristically incorporate peripheral nodes into the sample using specific expansion strategies. Extensive experiments are conducted to investigate how well sampling techniques preserve the clustering structure of graphs. Our empirical results show that the proposed sampling algorithms can preserve the population's clustering structure well and provide feasible solutions to sample the clustering structure from large-scale graphs.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] DRGraph: An Efficient Graph Layout Algorithm for Large-scale Graphs by Dimensionality Reduction
    Zhu, Minfeng
    Chen, Wei
    Hu, Yuanzhe
    Hou, Yuxuan
    Liu, Liangjun
    Zhang, Kaiyuan
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (02) : 1666 - 1676
  • [32] Constrained Multi-Start Path Planning Algorithm on Large-Scale Graphs
    Pu, Linfa
    Yang, Yajun
    Wang, Xin
    Computer Engineering and Applications, 2023, 59 (06) : 283 - 290
  • [33] Representative sampling in large-scale surveys
    Stephan, FF
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1939, 34 (206) : 343 - 352
  • [34] MDPCluster: a swarm-based community detection algorithm in large-scale graphs
    Shirjini, Mahsa Fozuni
    Farzi, Saeed
    Nikanjam, Amin
    COMPUTING, 2020, 102 (04) : 893 - 922
  • [35] MDPCluster: a swarm-based community detection algorithm in large-scale graphs
    Mahsa Fozuni Shirjini
    Saeed Farzi
    Amin Nikanjam
    Computing, 2020, 102 : 893 - 922
  • [36] A Sampling-Based Density Peaks Clustering Algorithm for Large-Scale Data
    Ding, Shifei
    Li, Chao
    Xu, Xiao
    Ding, Ling
    Zhang, Jian
    Guo, Lili
    Shi, Tianhao
    PATTERN RECOGNITION, 2023, 136
  • [37] A space sampling based large-scale many-objective evolutionary algorithm
    Gao, Xiaoxin
    He, Fazhi
    Duan, Yansong
    Ye, Chuanlong
    Bai, Junwei
    Zhang, Chen
    INFORMATION SCIENCES, 2024, 679
  • [38] Cluster abundance and large-scale structure
    Wu, JHP
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2001, 327 (02) : 629 - 638
  • [39] Large-scale cluster quantum microcombs
    Ze Wang
    Kangkang Li
    Yue Wang
    Xin Zhou
    Yinke Cheng
    Boxuan Jing
    Fengxiao Sun
    Jincheng Li
    Zhilin Li
    Bingyan Wu
    Qihuang Gong
    Qiongyi He
    Bei-Bei Li
    Qi-Fan Yang
    Light: Science & Applications, 14 (1)
  • [40] Preserving Security and Privacy in Large-Scale VANETs
    Qin, Bo
    Wu, Qianhong
    Domingo-Ferrer, Jose
    Zhang, Lei
    INFORMATION AND COMMUNICATIONS SECURITY, 2011, 7043 : 121 - +