Cluster-preserving sampling algorithm for large-scale graphs

被引：4

作者：

Zhang, Jianpeng ^{[1
]}

Chen, Hongchang ^{[1
]}

Yu, Dingjiu ^{[1
,2
]}

Pei, Yulong ^{[3
]}

Deng, Yingjun ^{[4
]}

机构：

[1] Informat Engn Univ, Natl Digital Switching Syst E&T Res Ctr, Zhengzhou 450001, Peoples R China

[2] Network Syst Dept Strateg Support Force, Beijing 100091, Peoples R China

[3] Eindhoven Univ Technol, Sch Comp Sci & Technol, NL-5612 AE Eindhoven, Netherlands

[4] Tianjin Univ, Ctr Appl Math, Tianjin 300072, Peoples R China

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2023年 / 66卷 / 01期

基金：

中国博士后科学基金;

关键词：

graph sampling; clustering structure; top-leader nodes; expansion strategies; large-scale graphs; NETWORKS;

D O I：

10.1007/s11432-021-3370-4

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph sampling is a very effective method to deal with scalability issues when analyzing large-scale graphs. Lots of sampling algorithms have been proposed, and sampling qualities have been quantified using explicit properties (e.g., degree distribution) of the sample. However, the existing sampling techniques are inadequate for the current sampling task: sampling the clustering structure, which is a crucial property of the current networks. In this paper, using different expansion strategies, two novel top-leader sampling methods (i.e., TLS-e and TLS-i) are proposed to obtain representative samples, and they are capable of effectively preserving the clustering structure. The rationale behind them is to select top-leader nodes of most clusters into the sample and then heuristically incorporate peripheral nodes into the sample using specific expansion strategies. Extensive experiments are conducted to investigate how well sampling techniques preserve the clustering structure of graphs. Our empirical results show that the proposed sampling algorithms can preserve the population's clustering structure well and provide feasible solutions to sample the clustering structure from large-scale graphs.

引用

页数：17

共 50 条

[41] Preserving Time in Large-Scale Communication Traces
Ratn, Prasun
Mueller, Frank
de Supinski, Bronis R.
Schulz, Martin
ICS'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2008, : 46 - +
[42] GAP: Genetic Algorithm Based Large-Scale Graph Partition in Heterogeneous Cluster
Li, Menghan
Cui, Huanqing
Zhou, Chuanai
Xu, Shaohua
IEEE ACCESS, 2020, 8 : 144197 - 144204
[43] Cluster-preserving dimension reduction methods for efficient classification of text data
Howland, P
Park, H
SURVEY OF TEXT MINING: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2004, : 3 - 23
[44] Privacy-preserving constrained spectral clustering algorithm for large-scale data sets
Li, Ji
Wei, Jianghong
Ye, Mao
Liu, Wenfen
Hu, Xuexian
IET INFORMATION SECURITY, 2020, 14 (03) : 321 - 331
[45] Towards a Transmission Line Limit Preserving Algorithm for Large-scale Power System Equivalents
Jang, Wonhyeok
Mohapatra, Saurav
Overbye, Thomas J.
2015 48TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2015, : 2759 - 2765
[46] Localize d curvature-base d combinatorial subgraph sampling for large-scale graphs
Shu, Dong Wook
Kim, Youjin
Kwon, Junseok
PATTERN RECOGNITION, 2023, 139
[47] Group Centrality Maximization for Large-scale Graphs
Angriman, Eugenio
van der Grinten, Alexander
Bojchevski, Aleksandar
Zuegner, Daniel
Guennemann, Stephan
Meyerhenke, Henning
2020 PROCEEDINGS OF THE SYMPOSIUM ON ALGORITHM ENGINEERING AND EXPERIMENTS, ALENEX, 2020, : 56 - 69
[48] Readable representations for large-scale bipartite graphs
Sato, Shuji
Misue, Kazuo
Tanaka, Jiro
KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2008, 5178 : 831 - 838
[49] Understanding Coarsening for Embedding Large-Scale Graphs
Akyildiz, Taha Atahan
Aljundi, Amro Alabsi
Kaya, Kamer
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2937 - 2946
[50] Efficient Machine Learning On Large-Scale Graphs
Erickson, Parker
Lee, Victor E.
Shi, Feng
Tang, Jiliang
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789

← 1 2 3 4 5 →