CCFinder: using Spark to find clustering coefficient in big graphs

被引:0
|
作者
Mehdi Alemi
Hassan Haghighi
Saeed Shahrivari
机构
[1] Shahid Beheshti University,Faculty of Computer Science and Engineering
[2] G. C.,Department of Computer Engineering
[3] Tarbiat Modares University (TMU),undefined
来源
The Journal of Supercomputing | 2017年 / 73卷
关键词
Triangle counting; Clustering coefficient; MapReduce; Graph processing;
D O I
暂无
中图分类号
学科分类号
摘要
Networks with billions of vertices introduce new challenges to perform graph analysis in a reasonable time. Clustering coefficient is an important analytical measure of networks such as social networks and biological networks. To compute clustering coefficient in big graphs, existing distributed algorithms suffer from low efficiency such that they may fail due to demanding lots of memory, or even, if they complete successfully, their execution time is not acceptable for real-world applications. We present a distributed MapReduce-based algorithm, called CCFinder, to efficiently compute clustering coefficient in very big graphs. CCFinder is executed on Apache Spark, a scalable data processing platform. It efficiently detects existing triangles through using our proposed data structure, called FONL, which is cached in the distributed memory provided by Spark and reused multiple times. As data items in the FONL are fine-grained and contain the minimum required information, CCFinder requires less storage space and has better parallelism in comparison with its competitors. To find clustering coefficient, our solution to triangle counting is extended to have degree information of the vertices in the appropriate places. We performed several experiments on a Spark cluster with 60 processors. The results show that CCFinder achieves acceptable scalability and outperforms six existing competitor methods. Four competitors are those methods proposed based on graph processing systems, i.e., GraphX, NScale, NScaleSpark, and Pregel frameworks, and two others are the Cohen’s method and NodeIterator++, introduced based on MapReduce.
引用
收藏
页码:4683 / 4710
页数:27
相关论文
共 50 条
  • [1] CCFinder: using Spark to find clustering coefficient in big graphs
    Alemi, Mehdi
    Haghighi, Hassan
    Shahrivari, Saeed
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (11): : 4683 - 4710
  • [2] A Framework for Clustering and Classification of Big Data Using Spark
    Mallios, Xristos
    Vassalos, Vasilis
    Venetis, Tassos
    Vlachou, Akrivi
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2016 CONFERENCES, 2016, 10033 : 344 - 362
  • [3] Parallel Correlation Clustering on Big Graphs
    Pan, Xinghao
    Papailiopoulos, Dimitris
    Oymak, Samet
    Recht, Benjamin
    Ramchandran, Kannan
    Jordan, Michael I.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [4] A graph clustering algorithm based on a clustering coefficient for weighted graphs
    Nascimento M.C.V.
    Carvalho A.C.P.L.F.
    Journal of the Brazilian Computer Society, 2011, 17 (01) : 19 - 29
  • [5] Computing the clustering coefficient of a random model of graphs
    Geng, Xianmin
    Zhou, Hongwei
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13 (3-4): : 413 - 419
  • [6] The Parallel Fuzzy C-Median Clustering Algorithm Using Spark for the Big Data
    Alam Mallik, Moksud
    Fariza Zulkurnain, Nurul
    Siddiqui, Sumrana
    Sarkar, Rashel
    IEEE ACCESS, 2024, 12 : 151785 - 151804
  • [7] Efficient Local Clustering Coefficient Estimation in Massive Graphs
    Zhang, Hao
    Zhu, Yuanyuan
    Qin, Lu
    Cheng, Hong
    Yu, Jeffrey Xu
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), PT II, 2017, 10178 : 371 - 386
  • [8] DEGREE AND CLUSTERING COEFFICIENT IN SPARSE RANDOM INTERSECTION GRAPHS
    Bloznelis, Mindaugas
    ANNALS OF APPLIED PROBABILITY, 2013, 23 (03): : 1254 - 1289
  • [9] Ranking weighted clustering coefficient in large dynamic graphs
    Xuefei Li
    Lijun Chang
    Kai Zheng
    Zi Huang
    Xiaofang Zhou
    World Wide Web, 2017, 20 : 855 - 883
  • [10] New classes of clustering coefficient locally maximizing graphs
    Fukami, Tatsuya
    Takahashi, Norikazu
    DISCRETE APPLIED MATHEMATICS, 2014, 162 : 202 - 213