Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

被引:0
|
作者
Taran, Vladyslav [1 ]
Alienin, Oleg [1 ]
Stirenko, Sergii [1 ]
Gordienko, Yuri [1 ]
Rojbi, A. [2 ]
机构
[1] Natl Tech Univ Ukraine, Igor Sikorsky Kyiv Polytech Inst, Kiev, Ukraine
[2] Univ Paris 08, CHArt Lab, Human & Artificial Cognit, 2 Rue Liberte, F-93526 St Denis, France
关键词
information systems; Big Data; distributed computing; clusters; Hadoop; Spark; speedup; machine learning; multimodal interactions; data image processing and recognition;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing the computations among a number of compute nodes. In this work, performance of distributed computing environments on the basis of Hadoop and Spark frameworks is estimated for real and virtual versions of clusters. As a test task, we chose the classic use case of word counting in texts of various sizes. It was found that the running times grow very fast with the dataset size and faster than a power function even. As to the real and virtual versions of cluster implementations, this tendency is the similar for both Hadoop and Spark frameworks. Moreover, speedup values decrease significantly with the growth of dataset size, especially for virtual version of cluster configuration. The problem of growing data generated by IoT and multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye tracking, etc.) interaction channels is presented. In the context of this problem, the current observations as to the running times and speedup on Hadoop and Spark frameworks in real and virtual cluster configurations can be very useful for the proper scaling-up and efficient job management, especially for machine learning and Deep Learning applications, where Big Data are widely present.
引用
收藏
页码:80 / 83
页数:4
相关论文
共 50 条
  • [21] Parallel and distributed architecture of genetic algorithm on Apache Hadoop and Spark
    Lu, Hau-Chun
    Hwang, F. J.
    Huang, Yao-Huei
    APPLIED SOFT COMPUTING, 2020, 95
  • [22] Easing the transition to distributed computing environments using object-oriented application frameworks
    Northey, J
    ASSOCIATION FOR INFORMATION SYSTEMS PROCEEDINGS OF THE AMERICAS CONFERENCE ON INFORMATION SYSTEMS, 1998, : 696 - 698
  • [23] Resource Profiling and Performance Modeling for Distributed Scientific Computing Environments
    Hossain, Md Azam
    Hwang, Soonwook
    Kim, Jik-Soo
    APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [24] A Study on the Performance of Distributed Storage Systems in Edge Computing Environments
    Makris, Antonios
    Kontopoulos, Ioannis
    Xyalis, Stylianos Nektarios
    Psomakelis, Evangelos
    Theodoropoulos, Theodoros
    Varvarigos, Andreas
    Tserpes, Konstantinos
    2024 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING, JCC, 2024, : 29 - 36
  • [25] Performance Evaluation of Read and Write Operations in Hadoop Distributed File System
    Krishna, T. Lakshmi Siva Rama
    Ragunathan, T.
    Battula, Sudheer Kumar
    2014 SIXTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2014, : 110 - 113
  • [26] On the viability of component frameworks for high performance distributed computing: A case study
    Kurzyniec, D
    Sunderam, V
    Migliardi, M
    11TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS, 2002, : 275 - 283
  • [27] Performance evaluation of Apache Hadoop and Apache Spark for parallelization of compute-intensive tasks
    Doeschl, Alexander
    Keller, Max-Emanuel
    Mandl, Peter
    22ND INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS2020), 2020, : 313 - 321
  • [28] Performance evaluation of Apache Hadoop and Apache Spark for parallelization of compute-intensive tasks
    Döschl, Alexander
    Keller, Max-Emanuel
    Mandl, Peter
    ACM International Conference Proceeding Series, 2020, : 313 - 321
  • [29] Application Traffic Classification in Hadoop Distributed Computing Environment
    Shim, Kyu-Seok
    Lee, Su-Kang
    Kim, Myung-Sup
    2014 16TH ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (APNOMS), 2014,
  • [30] Distributed computing in mobile environments
    Badrinath, BR
    COMPUTERS & GRAPHICS, 1996, 20 (05) : 615 - 617