A scalable community detection algorithm for large graphs using stochastic block models

被引:4
|
作者
Peng, Chengbin [1 ,2 ]
Zhang, Zhihua [3 ]
Wong, Ka-Chun [4 ]
Zhang, Xiangliang [1 ]
Keyes, David E. [1 ]
机构
[1] King Abdullah Univ Sci & Technol, Post Box 2925, Thuwal 239556900, Saudi Arabia
[2] Ningbo Inst Ind Technol, Ningbo, Zhejiang, Peoples R China
[3] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[4] City Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
关键词
Stochastic block model; parallel computing; community detection; MULTI;
D O I
10.3233/IDA-163156
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Community detection in graphs is widely used in social and biological networks, and the stochastic block model is a powerful probabilistic tool for describing graphs with community structures. However, in the era of "big data", traditional inference algorithms for such a model are increasingly limited due to their high time complexity and poor scalability. In this paper, we propose a multi-stage maximum likelihood approach to recover the latent parameters of the stochastic block model, in time linear with respect to the number of edges. We also propose a parallel algorithm based on message passing. Our algorithm can overlap communication and computation, providing speedup without compromising accuracy as the number of processors grows. For example, to process a real-world graph with about 1.3 million nodes and 10 million edges, our algorithm requires about 6 seconds on 64 cores of a contemporary commodity Linux cluster. Experiments demonstrate that the algorithm can produce high quality results on both benchmark and real-world graphs. An example of finding more meaningful communities is illustrated consequently in comparison with a popular modularity maximization algorithm.
引用
收藏
页码:1463 / 1485
页数:23
相关论文
共 50 条
  • [31] EigenSpokes: Surprising patterns and Scalable Community Chipping in Large Graphs
    Prakash, B. Aditya
    Seshadri, Mukund
    Sridharan, Aswin
    Machiraju, Sridhar
    Faloutsos, Christos
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 290 - +
  • [32] EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs
    Prakash, B. Aditya
    Sridharan, Ashwin
    Seshadri, Mukund
    Machiraju, Sridhar
    Faloutsos, Christos
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 : 435 - +
  • [33] Scalable parallel simulation of dynamical processes on large stochastic Kronecker graphs
    Bochenina, Klavdiya
    Kesarev, Sergey
    Boukhanovsky, Alexander
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 78 : 502 - 515
  • [34] CONSISTENCY OF COMMUNITY DETECTION IN NETWORKS UNDER DEGREE-CORRECTED STOCHASTIC BLOCK MODELS
    Zhao, Yunpeng
    Levina, Elizaveta
    Zhu, Ji
    ANNALS OF STATISTICS, 2012, 40 (04): : 2266 - 2292
  • [35] Profile-pseudo likelihood methods for community detection of multilayer stochastic block models
    Fu, Kang
    Hu, Jianwei
    STAT, 2023, 12 (01):
  • [36] Community detection in general stochastic block models: fundamental limits and efficient algorithms for recovery
    Abbe, Emmanuel
    Sandon, Colin
    2015 IEEE 56TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2015, : 670 - 688
  • [37] Large deviations for empirical measures of dense stochastic block graphs
    Zheng Wenhua
    Liu Qun
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020,
  • [38] Large-Scale Graphs Community Detection using Spark GraphFrames
    Apostol, Elena-Simona
    Cojocaru, Adrian-Cosmin
    Truica, Ciprian-Octavian
    2024 23RD INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING, ISPDC 2024, 2024,
  • [39] A distributed overlapping community detection model for large graphs using autoencoder
    Bhatia, Vandana
    Rani, Rinkle
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 94 : 16 - 26
  • [40] MDPCluster: a swarm-based community detection algorithm in large-scale graphs
    Shirjini, Mahsa Fozuni
    Farzi, Saeed
    Nikanjam, Amin
    COMPUTING, 2020, 102 (04) : 893 - 922