Parallel Spectral Clustering Based on MapReduce

被引:4
|
作者
Qiwei Zhong [1 ]
Yunlong Lin [1 ]
Junyang Zou [1 ]
Kuangyan Zhu [1 ]
Qiao Wang [1 ]
Lei Hu [2 ]
机构
[1] School of Information Science and Engineering,Southeast University
[2] ZTE Corporation
关键词
spectral clustering; parallel implementation; massive dataset; Hadoop MapReduce; data mining;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
Clustering is one of the most widely used techniques for exploratory data analysis.Spectral clustering algorithm,a popular modern clustering algorithm,has been shown to be more effective in detecting clusters than many traditional algorithms.It has applications ranging from computer vision and information retrieval to social science and biology.With the size of databases soaring,clustering algorithms have scaling computational time and memory use.In this paper,we propose a parallel spectral clustering implementation based on MapReduce.Both the computation and data storage are distributed,which solves the scalability problems for most existing algorithms.We empirically analyze the proposed implementation on both benchmark networks and a real social network dataset of about two million vertices and two billion edges crawled from Sina Weibo.It is shown that the proposed implementation scales well,speeds up the clustering without sacrificing quality,and processes massive datasets efficiently on commodity machine clusters.
引用
收藏
页码:45 / 50
页数:6
相关论文
共 50 条
  • [41] On a Strategy for Spectral Clustering with Parallel Computation
    Mouysset, Sandrine
    Noailles, Joseph
    Ruiz, Daniel
    Guivarch, Ronan
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2010, 2011, 6449 : 408 - 420
  • [42] Parallel Spectral Clustering with FEAST Library
    Mdaa, Saad
    Alami, Anass Ouali
    Guivarch, Ronan
    Mouysset, Sandrine
    ADVANCED RESEARCH IN TECHNOLOGIES, INFORMATION, INNOVATION AND SUSTAINABILITY, ARTIIS 2022, PT I, 2022, 1675 : 127 - 138
  • [43] Multiple Parallel MapReduce k-means Clustering with Validation and Selection
    Garcia, Kemilly Dearo
    Naldi, Murilo Coelho
    2014 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2014, : 432 - 437
  • [44] QUALITY BASED CLUSTERING USING MAPREDUCE FRAMEWORK
    Gowri, R.
    Rathipriya, R.
    PROCEEDINGS OF 2016 ONLINE INTERNATIONAL CONFERENCE ON GREEN ENGINEERING AND TECHNOLOGIES (IC-GET), 2016,
  • [45] A Big Graph Clustering Algorithm Based on MapReduce
    Leng, Yonglin
    Zhang, Qingchen
    MODERN TECHNOLOGIES IN MATERIALS, MECHANICS AND INTELLIGENT SYSTEMS, 2014, 1049 : 1467 - +
  • [46] Parallel Semi-Supervised Multi-Ant Colonies Clustering Ensemble Based on MapReduce Methodology
    Yang, Yan
    Teng, Fei
    Li, Tianrui
    Wang, Hao
    Wang, Hongjun
    Zhang, Qi
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2018, 6 (03) : 857 - 867
  • [47] Parallel Fuzzy C-Means Clustering Based Big Data Anonymization Using Hadoop MapReduce
    Lawrance, Josephine Usha
    Jesudhasan, Jesu Vedha Nayahi
    Rittammal, Jerald Beno Thampiraj
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 135 (04) : 2103 - 2130
  • [48] Parallel swarm intelligence strategies for large-scale clustering based on MapReduce with application to epigenetics of aging
    Benmounah, Zakaria
    Meshoul, Souham
    Batouche, Mohamed
    Lio, Pietro
    APPLIED SOFT COMPUTING, 2018, 69 : 771 - 783
  • [49] Parallel PLS Aigorithm Using MapReduce and Its Aplication in Spectral Modeling
    Yang Hui-hua
    Du Ling-ling
    Li Ling-qiao
    Tang Tian-biao
    Guo Tuo
    Liang Qiong-lin
    Wang Yi-ming
    Luo Guo-an
    SPECTROSCOPY AND SPECTRAL ANALYSIS, 2012, 32 (09) : 2399 - 2404
  • [50] Parallel K-PSO Based on MapReduce
    Wang, Junjun
    Yuan, Dongfeng
    Jiang, Mingyan
    PROCEEDINGS OF 2012 IEEE 14TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, 2012, : 1203 - 1208