From Micro-benchmarks to Machine Learning: Unveiling the Efficiency and Scalability of Hadoop and Spark

被引：0

作者：

Hebabaze, Salah Eddine ^{[1
]}

El Ghmary, Mohamed ^{[2
]}

El Bouabidi, Hamid ^{[1
]}

Maftah, Sara ^{[1
]}

Amnai, Mohamed ^{[1
]}

机构：

[1] Ibn Tofaïl University, Kenitra, Morocco

[2] Sidi Mohamed Ben Abdellah University, Fez, Morocco

来源：

International Journal of Interactive Mobile Technologies | 2024年 / 18卷 / 17期

关键词：

Adversarial machine learning - Benchmarking - MapReduce - Spatio-temporal data;

D O I：

10.3991/ijim.v18i17.44555

中图分类号：

学科分类号：

摘要：

With the exponential growth of data, the demand for efficient and scalable data processing solutions has become paramount. Hadoop and Spark, pivotal components of the open-source Big Data landscape, have been put to the test in this study. We conducted a comprehensive performance analysis of Hadoop and Spark in virtualized environments, evaluating their prowess across a suite of benchmarks. The benchmarks encompassed a spectrum of workloads, from micro-benchmarks such as Sort, WordCount, and TeraSort to web search tasks such as PageRank and machine learning endeavors including Naive Bayes and K-means. The central focus was to gauge their performance, efficiency, and resource utilization. The findings of this study underscore the benefits of Spark’s in-memory processing, demonstrating its superiority over Hadoop in various scenarios. Spark excels in machine learning and web search appli-cations, particularly when handling smaller inputs. Its efficient memory management and support for multiple iterations make it a strong choice. In resource-constrained environments or when dealing with large input files and limited memory, Hadoop may still hold an edge. The design and implementation of data processing solutions in virtualized environments should carefully consider the specific demands of each framework. This study not only presents a performance comparison of Hadoop and Spark across different benchmarks but also emphasizes the vital implications for designing and deploying data processing solutions in virtualized settings. It serves as a cornerstone for informed decision-making, paving the way for opti-mized algorithms and techniques in the dynamic landscape of big data processing. © 2024 by the authors of this article.

引用

页码：46 / 60

共 50 条

[1] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
Ali Mostafaeipour
Amir Jahangard Rafsanjani
Mohammad Ahmadi
Joshuva Arockia Dhanraj
The Journal of Supercomputing, 2021, 77 : 1273 - 1300
[2] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
Mostafaeipour, Ali
Rafsanjani, Amir Jahangard
Ahmadi, Mohammad
Dhanraj, Joshuva Arockia
JOURNAL OF SUPERCOMPUTING, 2021, 77 (02): : 1273 - 1300
[3] Scalability and efficiency in data mining and machine learning
Miera, Wagner, Jr.
2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 932 - 932
[4] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
Hai, Ameen Abdel
Forouraghi, Babak
BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
[5] Hadoop–Spark Framework for Machine Learning-Based Smart Irrigation Planning
Asmae El Mezouari
Abdelaziz El Fazziki
Mohammed Sadgal
SN Computer Science, 2022, 3 (1)
[6] Evaluating Energy Efficiency of GPUs using Machine Learning Benchmarks
Foster, Brett
Taneja, Shubbhi
Manzano, Joseph
Barker, Kevin
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 42 - 50
[7] Machine Learning Hardware Design for Efficiency, Flexibility, and Scalability [Feature]
Zhang, Jie-Fang
Zhang, Zhengya
IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2023, 23 (03) : 35 - 53
[8] A Comparison of NoSQL and SQL Databases over the Hadoop and Spark Cloud Platforms using Machine Learning Algorithms
Lee, Chao-Hsien
Shih, Zhe-Wei
2018 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN (ICCE-TW), 2018,
[9] Machine learning and uLBP histograms for posture recognition of dependent people via Big Data Hadoop and Spark platform
AlFayez, F.
Bouhamed, H.
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2023, 18 (01)
[10] PERFORMANCE COMPARISON OF APACHE SPARK AND HADOOP FOR MACHINE LEARNING BASED ITERATIVE GBTR ON HIGGS AND COVID-19 DATASETS
Sewal, Piyush
Singh, Hari
SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (03): : 1373 - 1386

← 1 2 3 4 5 →