From Micro-benchmarks to Machine Learning: Unveiling the Efficiency and Scalability of Hadoop and Spark

被引:0
|
作者
Hebabaze, Salah Eddine [1 ]
El Ghmary, Mohamed [2 ]
El Bouabidi, Hamid [1 ]
Maftah, Sara [1 ]
Amnai, Mohamed [1 ]
机构
[1] Ibn Tofaïl University, Kenitra, Morocco
[2] Sidi Mohamed Ben Abdellah University, Fez, Morocco
关键词
Adversarial machine learning - Benchmarking - MapReduce - Spatio-temporal data;
D O I
10.3991/ijim.v18i17.44555
中图分类号
学科分类号
摘要
With the exponential growth of data, the demand for efficient and scalable data processing solutions has become paramount. Hadoop and Spark, pivotal components of the open-source Big Data landscape, have been put to the test in this study. We conducted a comprehensive performance analysis of Hadoop and Spark in virtualized environments, evaluating their prowess across a suite of benchmarks. The benchmarks encompassed a spectrum of workloads, from micro-benchmarks such as Sort, WordCount, and TeraSort to web search tasks such as PageRank and machine learning endeavors including Naive Bayes and K-means. The central focus was to gauge their performance, efficiency, and resource utilization. The findings of this study underscore the benefits of Spark’s in-memory processing, demonstrating its superiority over Hadoop in various scenarios. Spark excels in machine learning and web search appli-cations, particularly when handling smaller inputs. Its efficient memory management and support for multiple iterations make it a strong choice. In resource-constrained environments or when dealing with large input files and limited memory, Hadoop may still hold an edge. The design and implementation of data processing solutions in virtualized environments should carefully consider the specific demands of each framework. This study not only presents a performance comparison of Hadoop and Spark across different benchmarks but also emphasizes the vital implications for designing and deploying data processing solutions in virtualized settings. It serves as a cornerstone for informed decision-making, paving the way for opti-mized algorithms and techniques in the dynamic landscape of big data processing. © 2024 by the authors of this article.
引用
收藏
页码:46 / 60
相关论文
共 50 条
  • [41] PREDICTING COMBUSTION VARIABILITY USING MACHINE LEARNING FROM THE FLOW FIELD DATA AT SPARK TIMING FOR A GASOLINE DIRECT INJECTION ENGINE
    Probst, Daniel
    Attal, Nitesh
    Mandhapati, Raju
    Avanessian, Oshin
    PROCEEDINGS OF ASME 2022 ICE FORWARD CONFERENCE, ICEF2022, 2022,
  • [42] Machine Learning Prediction of Crossbred Pig Feed Efficiency and Growth Rate From Single Nucleotide Polymorphisms
    Tusell, Llibertat
    Bergsma, Rob
    Gilbert, Helene
    Gianola, Daniel
    Piles, Miriam
    FRONTIERS IN GENETICS, 2020, 11
  • [43] Power Efficiency of S-Boxes: From a Machine-Learning-Based Tool to a Deterministic Model
    Sadhukhan, Rajat
    Datta, Nilanjan
    Mukhopadhyay, Debdeep
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2019, 27 (12) : 2829 - 2841
  • [44] Experimental study and machine learning modeling of water removal efficiency from crude oil using demulsifier
    Hashem, H. H.
    Kikhavani, T.
    Moradkhani, M. A.
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [45] Energy efficiency can deliver for climate policy: Evidence from machine learning-based targeting
    Christensen, Peter
    Francisco, Paul
    Myers, Erica
    Shao, Hansen
    Souza, Mateus
    JOURNAL OF PUBLIC ECONOMICS, 2024, 234
  • [46] A new approach for predicting oil mobilities and unveiling their controlling factors in a lacustrine shale system: Insights from interpretable machine learning model
    Wang, Enze
    Fu, Yingxiao
    Guo, Tonglou
    Li, Maowen
    FUEL, 2025, 379
  • [47] Who performs better? The heterogeneity of grain production eco-efficiency: Evidence from unsupervised machine learning
    Wang, Hanjie
    Han, Jiali
    Yu, Xiaohua
    ENVIRONMENTAL IMPACT ASSESSMENT REVIEW, 2024, 106
  • [48] Using machine learning for prediction of spray coated perovskite solar cells efficiency: From experimental to theoretical models
    Ichwani, Reisya
    Price, Stephen
    Oyewole, Oluwaseun K.
    Neamtu, Rodica
    Soboyejo, Winston O.
    MATERIALS & DESIGN, 2023, 233
  • [49] Machine learning-assisted prediction of organic solar cell efficiency from TCA triplelayer reflectance spectra
    Gao, Fuhao
    Zhou, Jinxin
    Zhao, Junwei
    Lin, Senxuan
    Liu, Jingfeng
    Lan, Yubin
    Long, Yongbing
    Xu, Haitao
    OPTICS COMMUNICATIONS, 2025, 582
  • [50] Deep Bidirectional Learning Machine for Predicting NOx Emissions and Boiler Efficiency from a Coal-Fired Boiler
    Li, Guo-Qiang
    Qi, Xiao-Bin
    Chan, Keith C. C.
    Chen, Bin
    ENERGY & FUELS, 2017, 31 (10) : 11471 - 11480