From Micro-benchmarks to Machine Learning: Unveiling the Efficiency and Scalability of Hadoop and Spark

被引:0
|
作者
Hebabaze, Salah Eddine [1 ]
El Ghmary, Mohamed [2 ]
El Bouabidi, Hamid [1 ]
Maftah, Sara [1 ]
Amnai, Mohamed [1 ]
机构
[1] Ibn Tofaïl University, Kenitra, Morocco
[2] Sidi Mohamed Ben Abdellah University, Fez, Morocco
关键词
Adversarial machine learning - Benchmarking - MapReduce - Spatio-temporal data;
D O I
10.3991/ijim.v18i17.44555
中图分类号
学科分类号
摘要
With the exponential growth of data, the demand for efficient and scalable data processing solutions has become paramount. Hadoop and Spark, pivotal components of the open-source Big Data landscape, have been put to the test in this study. We conducted a comprehensive performance analysis of Hadoop and Spark in virtualized environments, evaluating their prowess across a suite of benchmarks. The benchmarks encompassed a spectrum of workloads, from micro-benchmarks such as Sort, WordCount, and TeraSort to web search tasks such as PageRank and machine learning endeavors including Naive Bayes and K-means. The central focus was to gauge their performance, efficiency, and resource utilization. The findings of this study underscore the benefits of Spark’s in-memory processing, demonstrating its superiority over Hadoop in various scenarios. Spark excels in machine learning and web search appli-cations, particularly when handling smaller inputs. Its efficient memory management and support for multiple iterations make it a strong choice. In resource-constrained environments or when dealing with large input files and limited memory, Hadoop may still hold an edge. The design and implementation of data processing solutions in virtualized environments should carefully consider the specific demands of each framework. This study not only presents a performance comparison of Hadoop and Spark across different benchmarks but also emphasizes the vital implications for designing and deploying data processing solutions in virtualized settings. It serves as a cornerstone for informed decision-making, paving the way for opti-mized algorithms and techniques in the dynamic landscape of big data processing. © 2024 by the authors of this article.
引用
收藏
页码:46 / 60
相关论文
共 50 条
  • [1] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
    Ali Mostafaeipour
    Amir Jahangard Rafsanjani
    Mohammad Ahmadi
    Joshuva Arockia Dhanraj
    The Journal of Supercomputing, 2021, 77 : 1273 - 1300
  • [2] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
    Mostafaeipour, Ali
    Rafsanjani, Amir Jahangard
    Ahmadi, Mohammad
    Dhanraj, Joshuva Arockia
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (02): : 1273 - 1300
  • [3] Scalability and efficiency in data mining and machine learning
    Miera, Wagner, Jr.
    2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 932 - 932
  • [4] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
  • [5] Hadoop–Spark Framework for Machine Learning-Based Smart Irrigation Planning
    Asmae El Mezouari
    Abdelaziz El Fazziki
    Mohammed Sadgal
    SN Computer Science, 2022, 3 (1)
  • [6] Evaluating Energy Efficiency of GPUs using Machine Learning Benchmarks
    Foster, Brett
    Taneja, Shubbhi
    Manzano, Joseph
    Barker, Kevin
    2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 42 - 50
  • [7] Machine Learning Hardware Design for Efficiency, Flexibility, and Scalability [Feature]
    Zhang, Jie-Fang
    Zhang, Zhengya
    IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2023, 23 (03) : 35 - 53
  • [8] A Comparison of NoSQL and SQL Databases over the Hadoop and Spark Cloud Platforms using Machine Learning Algorithms
    Lee, Chao-Hsien
    Shih, Zhe-Wei
    2018 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN (ICCE-TW), 2018,
  • [9] Machine learning and uLBP histograms for posture recognition of dependent people via Big Data Hadoop and Spark platform
    AlFayez, F.
    Bouhamed, H.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2023, 18 (01)
  • [10] PERFORMANCE COMPARISON OF APACHE SPARK AND HADOOP FOR MACHINE LEARNING BASED ITERATIVE GBTR ON HIGGS AND COVID-19 DATASETS
    Sewal, Piyush
    Singh, Hari
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (03): : 1373 - 1386