A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets

被引：0

作者：

Ankita Sinha

Prasanta K. Jana

机构：

[1] IIT (ISM),Department of Computer Science and Engineering

[2] Dhanbad,undefined

来源：

The Journal of Supercomputing | 2018年 / 74卷

关键词：

Mahalanobis distance; Apache Hadoop; -means++ initialization; Genetic algorithm;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Clustering a large volume of data in a distributed environment is a challenging issue. Data stored across multiple machines are huge in size, and solution space is large. Genetic algorithm deals effectively with larger solution space and provides better solution. In this paper, we proposed a novel clustering algorithm for distributed datasets, using combination of genetic algorithm (GA) with Mahalanobis distance and k-means clustering algorithm. The proposed algorithm is two phased; in phase 1, GA is applied in parallel on data chunks located across different machines. Mahalanobis distance is used as fitness value in GA, which considers covariance between the data points and thus provides a better representation of initial data. K-means with K-means++\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ ++ $$\end{document} initialization is applied in phase 2 on intermediate output to get final result. The proposed algorithm is implemented on Hadoop framework, which is inherently designed to deal with distributed datasets in a fault-tolerant manner. Extensive experiments were conducted for multiple real-life and synthetic datasets to measure performance of our proposed algorithm. Results were compared with MapReduce-based algorithms, mrk-means, parallel k-means and scaling GA.

引用

页码：1562 / 1579

页数：17

共 50 条

[31] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
Mao, Yingchi
Xu, Ziyang
Ping, Ping
Wang, Longbao
2015 NINTH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY FCST 2015, 2015, : 386 - 391
[32] An Effective and Efficient Clustering Based on K-Means Using MapReduce and TLBO
Pedireddla, Praveen Kumar
Yadwad, Sunita A.
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 619 - 628
[33] Genetic TKM: A Hybrid Clustering Method Based on Genetic Algorithm, Tabu Search and K-Means
Yaghini, Masoud
Gereilinia, Nasim
INTERNATIONAL JOURNAL OF APPLIED METAHEURISTIC COMPUTING, 2013, 4 (01) : 67 - 77
[34] A hybrid clustering technique combining a novel genetic algorithm with K-Means
Rahman, Md Anisur
Islam, Md Zahidul
KNOWLEDGE-BASED SYSTEMS, 2014, 71 : 345 - 365
[35] A K-means Optimized Clustering Algorithm Based on Improved Genetic Algorithm
Pu, Qiu-Mei
Wu, Qiong
Li, Qian
Lecture Notes in Electrical Engineering, 2022, 801 LNEE : 133 - 140
[36] K-Means and Fuzzy based Hybrid Clustering Algorithm for WSN
Angadi, Basavaraj M.
Kakkasageri, Mahabaleshwar S.
INTERNATIONAL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2023, 69 (04) : 793 - 801
[37] On K-means Data Clustering Algorithm with Genetic Algorithm
Kapil, Shruti
Chawla, Meenu
Ansari, Mohd Dilshad
2016 FOURTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (PDGC), 2016, : 202 - 206
[38] Clustering with Niching Genetic K-means algorithm
Sheng, WG
Tucker, A
Liu, XH
GENETIC AND EVOLUTIONARY COMPUTATION GECCO 2004 , PT 2, PROCEEDINGS, 2004, 3103 : 162 - 173
[39] Modified K-Means Algorithm for Genetic Clustering
Bonab, Mohammad Babrdel
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2011, 11 (09): : 24 - 28
[40] A k-means based clustering algorithm
Bloisi, Domenico Daniele
Locchi, Luca
COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 109 - 118

← 1 2 3 4 5 →