Big data clustering techniques based on Spark: a literature review

被引：0

作者：

Saeed M.M. ^{[1
]}

Aghbari Z.A. ^{[2
]}

Alsharidah M. ^{[1
]}

机构：

[1] Department of Computer Science, Prince Sattam Bin Abdul Aziz, Riyadh

[2] Department of Computer Science, University of Sharjah, Sharjah

来源：

PeerJ Computer Science | 2020年 / 6卷

关键词：

Big Data; Big Data clustering; Spark; Spark-based clustering;

D O I：

10.7717/PEERJ-CS.321

中图分类号：

学科分类号：

摘要：

A popular unsupervised learning method, known as clustering, is extensively used in data mining, machine learning and pattern recognition. The procedure involves grouping of single and distinct points in a group in such a way that they are either similar to each other or dissimilar to points of other clusters. Traditional clustering methods are greatly challenged by the recent massive growth of data. Therefore, several research works proposed novel designs for clustering methods that leverage the benefits of Big Data platforms, such as Apache Spark, which is designed for fast and distributed massive data processing. However, Spark-based clustering research is still in its early days. In this systematic survey, we investigate the existing Spark-based clustering methods in terms of their support to the characteristics Big Data. Moreover, we propose a new taxonomy for the Spark-based clustering methods. To the best of our knowledge, no survey has been conducted on Spark-based clustering of Big Data. Therefore, this survey aims to present a comprehensive summary of the previous studies in the field of Big Data clustering using Apache Spark during the span of 2010-2020. This survey also highlights the new research directions in the field of clustering massive data. © Copyright 2020 Saeed et al.

引用

页码：1 / 28

页数：27

共 50 条

[1] Big data clustering techniques based on Spark: a literature review
Saeed, Mozamel M.
Al Aghbari, Zaher
Alsharidah, Mohammed
PEERJ COMPUTER SCIENCE, 2020,
[2] Apache Spark Methods and Techniques in Big Data-A Review
Sahana, H. P.
Sanjana, M. S.
Muddasir, N. Mohammed
Vidyashree, K. P.
INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES, ICICCT 2019, 2020, 89 : 721 - 726
[3] Big Data Clustering Techniques Challenges and Perspectives: Review
Awad F.H.
Hamad M.M.
Informatica (Slovenia), 2023, 47 (06): : 203 - 218
[4] An Efficient Parallel Algorithm for Clustering Big Data based on the Spark Framework
Faculty of Science of Rabat, Mohammed V University, Rabat, Morocco
Intl. J. Adv. Comput. Sci. Appl., 7 (890-896):
[5] An Efficient Parallel Algorithm for Clustering Big Data based on the Spark Framework
Dafir, Zineb
Slaoui, Said
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 890 - 896
[6] Literature Review on High Dimensional Data Clustering Techniques
Selvavinayagam, G.
Loganathan, Venkateshwaran
Loheswaran, K.
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (06): : 183 - 187
[7] A Framework for Clustering and Classification of Big Data Using Spark
Mallios, Xristos
Vassalos, Vasilis
Venetis, Tassos
Vlachou, Akrivi
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2016 CONFERENCES, 2016, 10033 : 344 - 362
[8] Design of Intelligent K-Means Based on Spark for Big Data Clustering
Kusuma, Ilham
Ma'sum, M. Anwar
Habibie, Novian
Jatmiko, Wisnu
Suhartanto, Heru
2016 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), 2016, : 89 - 95
[9] Fuzzy Based Clustering Algorithms to Handle Big Data with Implementation on Apache Spark
Bharill, Neha
Tiwari, Aruna
Malviya, Aayushi
PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, : 95 - 104
[10] Literature review and analysis on big data stream classification techniques
Srivani, B.
Sandhya, N.
Rani, B. Padmaja
INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2020, 24 (03) : 205 - 215

← 1 2 3 4 5 →