Parallel Spectral Clustering Based on MapReduce

被引:4
|
作者
Qiwei Zhong [1 ]
Yunlong Lin [1 ]
Junyang Zou [1 ]
Kuangyan Zhu [1 ]
Qiao Wang [1 ]
Lei Hu [2 ]
机构
[1] School of Information Science and Engineering,Southeast University
[2] ZTE Corporation
关键词
spectral clustering; parallel implementation; massive dataset; Hadoop MapReduce; data mining;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
Clustering is one of the most widely used techniques for exploratory data analysis.Spectral clustering algorithm,a popular modern clustering algorithm,has been shown to be more effective in detecting clusters than many traditional algorithms.It has applications ranging from computer vision and information retrieval to social science and biology.With the size of databases soaring,clustering algorithms have scaling computational time and memory use.In this paper,we propose a parallel spectral clustering implementation based on MapReduce.Both the computation and data storage are distributed,which solves the scalability problems for most existing algorithms.We empirically analyze the proposed implementation on both benchmark networks and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo.It is shown that the proposed implementation scales well,speeds up the clustering without sacrificing quality,and processes massive datasets efficiently on commodity machine clusters.
引用
收藏
页码:45 / 50
页数:6
相关论文
共 50 条
  • [1] Parallel Text Clustering Based on MapReduce
    Cao Zewen
    Zhou Yao
    SECOND INTERNATIONAL CONFERENCE ON CLOUD AND GREEN COMPUTING / SECOND INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING AND ITS APPLICATIONS (CGC/SCA 2012), 2012, : 226 - 229
  • [2] Parallel Chameleon clustering based on MapReduce
    Ma, Lingxiao
    Li, Yi
    Tang, Hancong
    Chi, Weilai
    Dang, Depeng
    Journal of Information and Computational Science, 2015, 12 (06): : 2053 - 2062
  • [3] Parallel Clustering Validation Based on MapReduce
    Zerabi, Soumeya
    Meshoul, Souham
    Khantoul, Bilel
    ADVANCES IN COMPUTING SYSTEMS AND APPLICATIONS, 2019, 50 : 291 - 299
  • [4] Parallel Black Hole Clustering Based on MapReduce
    Tsai, Chun-Wei
    Hsieh, Cheng-Han
    Chiang, Ming-Chao
    2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 2543 - 2548
  • [5] A Parallel Clustering Method Study Based on MapReduce
    Sun Zhanquan
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON CLOUD COMPUTING AND INFORMATION SECURITY (CCIS 2013), 2013, 52 : 416 - 419
  • [6] Parallel K-Means Clustering Based on MapReduce
    Zhao, Weizhong
    Ma, Huifang
    He, Qing
    CLOUD COMPUTING, PROCEEDINGS, 2009, 5931 : 674 - 679
  • [7] Parallel Diffrential Evolution Clustering Algorithm based on MapReduce
    Daoudi, Meroua
    Hamena, Soumiya
    Benmounah, Zakaria
    Batouche, Mohamed
    2014 6TH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2014, : 337 - 341
  • [8] Research of parallel DBSCAN clustering algorithm based on MapReduce
    Fu, X. (xffu@gdut.edu.cn), 1600, Science and Engineering Research Support Society (07):
  • [9] A Parallel K-Medoids Algorithm for Clustering based on MapReduce
    Shafiq, M. Omair
    Torunski, Eric
    2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 502 - 507
  • [10] PESC: A Parallel System for Clustering ECG Streams Based on MapReduce
    Yang, Lin
    Zhang, Jin
    Zhang, Qian
    2013 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2013, : 2604 - 2609