Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

被引:0
|
作者
Taran, Vladyslav [1 ]
Alienin, Oleg [1 ]
Stirenko, Sergii [1 ]
Gordienko, Yuri [1 ]
Rojbi, A. [2 ]
机构
[1] Natl Tech Univ Ukraine, Igor Sikorsky Kyiv Polytech Inst, Kiev, Ukraine
[2] Univ Paris 08, CHArt Lab, Human & Artificial Cognit, 2 Rue Liberte, F-93526 St Denis, France
关键词
information systems; Big Data; distributed computing; clusters; Hadoop; Spark; speedup; machine learning; multimodal interactions; data image processing and recognition;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing the computations among a number of compute nodes. In this work, performance of distributed computing environments on the basis of Hadoop and Spark frameworks is estimated for real and virtual versions of clusters. As a test task, we chose the classic use case of word counting in texts of various sizes. It was found that the running times grow very fast with the dataset size and faster than a power function even. As to the real and virtual versions of cluster implementations, this tendency is the similar for both Hadoop and Spark frameworks. Moreover, speedup values decrease significantly with the growth of dataset size, especially for virtual version of cluster configuration. The problem of growing data generated by IoT and multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye tracking, etc.) interaction channels is presented. In the context of this problem, the current observations as to the running times and speedup on Hadoop and Spark frameworks in real and virtual cluster configurations can be very useful for the proper scaling-up and efficient job management, especially for machine learning and Deep Learning applications, where Big Data are widely present.
引用
收藏
页码:80 / 83
页数:4
相关论文
共 50 条
  • [1] Performance Analysis of Distributed Computing Frameworks for Big Data Analytics: Hadoop Vs Spark
    Ketu, Shwet
    Mishra, Pramod Kumar
    Agarwal, Sonali
    COMPUTACION Y SISTEMAS, 2020, 24 (02): : 669 - 686
  • [2] Performance Evaluation and Tuning for MapReduce Computing in Hadoop Distributed File System
    Kim, Jongyeop
    Kumar, Ashwin T. K.
    George, K. M.
    Park, Nohpill
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2015, : 62 - 68
  • [3] Performance comparison between Hadoop and Spark frameworks using HiBench benchmarks
    Samadi, Yassir
    Zbakh, Mostapha
    Tadonki, Claude
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (12):
  • [4] Resilient Distributed Computing Platforms for Big Data Analysis Using Spark and Hadoop
    Chang, Bao Rong
    Tsai, Hsiu-Fen
    Wang, Yo-Ai
    Huang, Chien-Feng
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON APPLIED SYSTEM INNOVATION (ICASI), 2016,
  • [5] Application Profiling in Hierarchical Hadoop for Geo-distributed Computing Environments
    Cavallo, Marco
    Di Modica, Giuseppe
    Polito, Carmelo
    Tomarchio, Orazio
    2016 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2016, : 555 - 560
  • [6] Memory or Time: Performance Evaluation for Iterative Operation on Hadoop and Spark
    Gu, Lei
    Li, Huan
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 721 - 727
  • [7] Research on Performance Optimization of Spark Distributed Computing Platform
    He, Qinlu
    Zhang, Fan
    Bian, Genqing
    Zhang, Weiqi
    Li, Zhen
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (02): : 2833 - 2850
  • [8] Performance evaluation of distributed computing
    Guo, QP
    Guo, YC
    Paker, Y
    Parkinson, D
    DCABES 2002, PROCEEDING, 2002, : 100 - 104
  • [9] Performance Analytics for Scientific Distributed Computing Environments
    Datskova, Olga
    Grigoras, Costin
    Shi, Weidong
    INTERNATIONAL CONFERENCE ON BIG DATA AND INTERNET OF THINGS (BDIOT 2017), 2017, : 75 - 79
  • [10] A Hierarchical Hadoop Framework to Handle Big Data in Geo-Distributed Computing Environments
    Tomarchio, Orazio
    Di Modica, Giuseppe
    Cavallo, Marco
    Polito, Carmelo
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH, 2018, 11 (01) : 16 - 47