Cluster-preserving sampling algorithm for large-scale graphs

被引:4
|
作者
Zhang, Jianpeng [1 ]
Chen, Hongchang [1 ]
Yu, Dingjiu [1 ,2 ]
Pei, Yulong [3 ]
Deng, Yingjun [4 ]
机构
[1] Informat Engn Univ, Natl Digital Switching Syst E&T Res Ctr, Zhengzhou 450001, Peoples R China
[2] Network Syst Dept Strateg Support Force, Beijing 100091, Peoples R China
[3] Eindhoven Univ Technol, Sch Comp Sci & Technol, NL-5612 AE Eindhoven, Netherlands
[4] Tianjin Univ, Ctr Appl Math, Tianjin 300072, Peoples R China
基金
中国博士后科学基金;
关键词
graph sampling; clustering structure; top-leader nodes; expansion strategies; large-scale graphs; NETWORKS;
D O I
10.1007/s11432-021-3370-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph sampling is a very effective method to deal with scalability issues when analyzing large-scale graphs. Lots of sampling algorithms have been proposed, and sampling qualities have been quantified using explicit properties (e.g., degree distribution) of the sample. However, the existing sampling techniques are inadequate for the current sampling task: sampling the clustering structure, which is a crucial property of the current networks. In this paper, using different expansion strategies, two novel top-leader sampling methods (i.e., TLS-e and TLS-i) are proposed to obtain representative samples, and they are capable of effectively preserving the clustering structure. The rationale behind them is to select top-leader nodes of most clusters into the sample and then heuristically incorporate peripheral nodes into the sample using specific expansion strategies. Extensive experiments are conducted to investigate how well sampling techniques preserve the clustering structure of graphs. Our empirical results show that the proposed sampling algorithms can preserve the population's clustering structure well and provide feasible solutions to sample the clustering structure from large-scale graphs.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Preserving Time in Large-Scale Communication Traces
    Ratn, Prasun
    Mueller, Frank
    de Supinski, Bronis R.
    Schulz, Martin
    ICS'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2008, : 46 - +
  • [42] GAP: Genetic Algorithm Based Large-Scale Graph Partition in Heterogeneous Cluster
    Li, Menghan
    Cui, Huanqing
    Zhou, Chuanai
    Xu, Shaohua
    IEEE ACCESS, 2020, 8 : 144197 - 144204
  • [43] Cluster-preserving dimension reduction methods for efficient classification of text data
    Howland, P
    Park, H
    SURVEY OF TEXT MINING: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2004, : 3 - 23
  • [44] Privacy-preserving constrained spectral clustering algorithm for large-scale data sets
    Li, Ji
    Wei, Jianghong
    Ye, Mao
    Liu, Wenfen
    Hu, Xuexian
    IET INFORMATION SECURITY, 2020, 14 (03) : 321 - 331
  • [45] Towards a Transmission Line Limit Preserving Algorithm for Large-scale Power System Equivalents
    Jang, Wonhyeok
    Mohapatra, Saurav
    Overbye, Thomas J.
    2015 48TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2015, : 2759 - 2765
  • [46] Localize d curvature-base d combinatorial subgraph sampling for large-scale graphs
    Shu, Dong Wook
    Kim, Youjin
    Kwon, Junseok
    PATTERN RECOGNITION, 2023, 139
  • [47] Group Centrality Maximization for Large-scale Graphs
    Angriman, Eugenio
    van der Grinten, Alexander
    Bojchevski, Aleksandar
    Zuegner, Daniel
    Guennemann, Stephan
    Meyerhenke, Henning
    2020 PROCEEDINGS OF THE SYMPOSIUM ON ALGORITHM ENGINEERING AND EXPERIMENTS, ALENEX, 2020, : 56 - 69
  • [48] Readable representations for large-scale bipartite graphs
    Sato, Shuji
    Misue, Kazuo
    Tanaka, Jiro
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2008, 5178 : 831 - 838
  • [49] Understanding Coarsening for Embedding Large-Scale Graphs
    Akyildiz, Taha Atahan
    Aljundi, Amro Alabsi
    Kaya, Kamer
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2937 - 2946
  • [50] Efficient Machine Learning On Large-Scale Graphs
    Erickson, Parker
    Lee, Victor E.
    Shi, Feng
    Tang, Jiliang
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789