From Micro-benchmarks to Machine Learning: Unveiling the Efficiency and Scalability of Hadoop and Spark

被引:0
|
作者
Hebabaze, Salah Eddine [1 ]
El Ghmary, Mohamed [2 ]
El Bouabidi, Hamid [1 ]
Maftah, Sara [1 ]
Amnai, Mohamed [1 ]
机构
[1] Ibn Tofaïl University, Kenitra, Morocco
[2] Sidi Mohamed Ben Abdellah University, Fez, Morocco
关键词
Adversarial machine learning - Benchmarking - MapReduce - Spatio-temporal data;
D O I
10.3991/ijim.v18i17.44555
中图分类号
学科分类号
摘要
With the exponential growth of data, the demand for efficient and scalable data processing solutions has become paramount. Hadoop and Spark, pivotal components of the open-source Big Data landscape, have been put to the test in this study. We conducted a comprehensive performance analysis of Hadoop and Spark in virtualized environments, evaluating their prowess across a suite of benchmarks. The benchmarks encompassed a spectrum of workloads, from micro-benchmarks such as Sort, WordCount, and TeraSort to web search tasks such as PageRank and machine learning endeavors including Naive Bayes and K-means. The central focus was to gauge their performance, efficiency, and resource utilization. The findings of this study underscore the benefits of Spark’s in-memory processing, demonstrating its superiority over Hadoop in various scenarios. Spark excels in machine learning and web search appli-cations, particularly when handling smaller inputs. Its efficient memory management and support for multiple iterations make it a strong choice. In resource-constrained environments or when dealing with large input files and limited memory, Hadoop may still hold an edge. The design and implementation of data processing solutions in virtualized environments should carefully consider the specific demands of each framework. This study not only presents a performance comparison of Hadoop and Spark across different benchmarks but also emphasizes the vital implications for designing and deploying data processing solutions in virtualized settings. It serves as a cornerstone for informed decision-making, paving the way for opti-mized algorithms and techniques in the dynamic landscape of big data processing. © 2024 by the authors of this article.
引用
收藏
页码:46 / 60
相关论文
共 50 条
  • [31] Individual yarn fibre extraction from micro-CT: multilevel machine learning approach
    Henys, Petr
    Capek, Lukas
    JOURNAL OF THE TEXTILE INSTITUTE, 2021, 112 (12) : 1979 - 1985
  • [32] Seeking regularity from irregularity: unveiling the synthesis-nanomorphology relationships of heterogeneous nanomaterials using unsupervised machine learning
    Yao, Lehan
    An, Hyosung
    Zhou, Shan
    Kim, Ahyoung
    Luijten, Erik
    Chen, Qian
    NANOSCALE, 2022, 14 (44) : 16479 - 16489
  • [33] Applying a Neural Network-Based Machine Learning to Laser-Welded Spark Plasma Sintered Steel: Predicting Vickers Micro-Hardness
    Olanipekun, Ayorinde Tayo
    Mashinini, Peter Madindwa
    Owojaiye, Oluwakemi Adejoke
    Maledi, Nthabiseng Beauty
    JOURNAL OF MANUFACTURING AND MATERIALS PROCESSING, 2022, 6 (05):
  • [34] Machine learning and circular bioeconomy: Building new resource efficiency from diverse waste streams
    Tsui, To-Hung
    van Loosdrecht, Mark C. M.
    Dai, Yanjun
    Tong, Yen Wah
    BIORESOURCE TECHNOLOGY, 2023, 369
  • [35] A Novel Machine Learning Strategy for the Prediction of Antihypertensive Peptides Derived from Food with High Efficiency
    Wang, Liyang
    Niu, Dantong
    Wang, Xiaoya
    Khan, Jabir
    Shen, Qun
    Xue, Yong
    FOODS, 2021, 10 (03)
  • [36] The impact of green innovation on carbon reduction efficiency in China: Evidence from machine learning validation
    Zhao, Qiuyun
    Jiang, Mei
    Zhao, Zuoxiang
    Liu, Fan
    Zhou, Li
    ENERGY ECONOMICS, 2024, 133
  • [37] Analysts' ESG attention and stock pricing efficiency: evidence from machine learning and text analysis
    Tan, Changchun
    Yin, Kangkang
    Wu, Huaqing
    Zhou, Peng
    JOURNAL OF ACCOUNTING LITERATURE, 2025,
  • [38] Unveiling the influence of agricultural mechanization on greenhouse gas emission intensity: Insights from China using causal machine learning model
    Wang, Lulu
    Lyu, Jie
    Wang, Shanshan
    Zhang, Junyan
    AGRICULTURAL SYSTEMS, 2025, 226
  • [39] Unveiling the epilepsy enigma: an agile and optimal machine learning approach for detecting inter-ictal state from electroencephalogram signals
    Shoibolina Kaushik
    Mamatha Balachandra
    Diana Olivia
    Zaid Khan
    International Journal of Information Technology, 2024, 16 (8) : 5149 - 5172
  • [40] Unveiling the driving patterns of carbon prices through an explainable machine learning framework: Evidence from Chinese emission trading schemes
    Lei, Heng
    Xue, Minggao
    Liu, Huiling
    Ye, Jing
    JOURNAL OF CLEANER PRODUCTION, 2024, 438