Finding connected components in an undirected graph has many practical applications. For example in a graph representing a social network, a connected component represents a group of related individuals with common interest. Also, finding connected components forms the basis for other clustering algorithms. In this paper, we will present a parallel algorithm which uses the well known sequential algorithm as the basis for finding connected components in an undirected graph. The algorithm can be adopted to run on a single computer with multiple cores or MapReduce. It is robust in the sense that it honors memory limits. This is important in today's containerized environments. It balances the workload even in the presence of data skew. For the best known algorithm running in MapReduce, the number of iterations is the square of the logarithmic function of the number of vertices in the graph. For our algorithm, we will prove that the upper bound on the number of iterations is a logarithmic function of the maximum size of a connected component. In each iteration, the amount of data read from or written to a file system is bounded by four times the number of edges in the graph.
机构:
Univ Groningen, Johann Bernoulli Inst Math & Comp Sci, POB 407, NL-9700 AK Groningen, NetherlandsUniv Groningen, Johann Bernoulli Inst Math & Comp Sci, POB 407, NL-9700 AK Groningen, Netherlands
Wilkinson, Michael H. F.
Pesaresi, Martino
论文数: 0引用数: 0
h-index: 0
机构:
European Commiss, Joint Res Ctr, Inst Protect & Secur Citizen, Global Secur & Crisis Management Unit, Via Enrico Fermi 2749, I-21027 Ispra, VA, ItalyUniv Groningen, Johann Bernoulli Inst Math & Comp Sci, POB 407, NL-9700 AK Groningen, Netherlands
Pesaresi, Martino
Ouzounis, Georgios K.
论文数: 0引用数: 0
h-index: 0
机构:
DigitalGlobe Inc, 1300 W 120th Ave, Westminster, CO 80234 USAUniv Groningen, Johann Bernoulli Inst Math & Comp Sci, POB 407, NL-9700 AK Groningen, Netherlands