Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

被引:0
|
作者
Taran, Vladyslav [1 ]
Alienin, Oleg [1 ]
Stirenko, Sergii [1 ]
Gordienko, Yuri [1 ]
Rojbi, A. [2 ]
机构
[1] Natl Tech Univ Ukraine, Igor Sikorsky Kyiv Polytech Inst, Kiev, Ukraine
[2] Univ Paris 08, CHArt Lab, Human & Artificial Cognit, 2 Rue Liberte, F-93526 St Denis, France
关键词
information systems; Big Data; distributed computing; clusters; Hadoop; Spark; speedup; machine learning; multimodal interactions; data image processing and recognition;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing the computations among a number of compute nodes. In this work, performance of distributed computing environments on the basis of Hadoop and Spark frameworks is estimated for real and virtual versions of clusters. As a test task, we chose the classic use case of word counting in texts of various sizes. It was found that the running times grow very fast with the dataset size and faster than a power function even. As to the real and virtual versions of cluster implementations, this tendency is the similar for both Hadoop and Spark frameworks. Moreover, speedup values decrease significantly with the growth of dataset size, especially for virtual version of cluster configuration. The problem of growing data generated by IoT and multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye tracking, etc.) interaction channels is presented. In the context of this problem, the current observations as to the running times and speedup on Hadoop and Spark frameworks in real and virtual cluster configurations can be very useful for the proper scaling-up and efficient job management, especially for machine learning and Deep Learning applications, where Big Data are widely present.
引用
收藏
页码:80 / 83
页数:4
相关论文
共 50 条
  • [31] Distributed computing in mobile environments
    Badrinath, B.R.
    Computers and Graphics (Pergamon), 1996, 20 (05): : 615 - 617
  • [32] Applications of distributed computing environments
    Baker, M
    CONCURRENCY-PRACTICE AND EXPERIENCE, 1999, 11 (04): : 167 - 168
  • [33] Performance Evaluation of Resource Management in Cloud Computing Environments
    Batista, Bruno Guazzelli
    Estrella, Julio Cezar
    Gomes Ferreira, Carlos Henrique
    Leite Filho, Dionisio Machado
    Vasconcelos Nakamura, Luis Hideo
    Reiff-Marganiec, Stephan
    Santana, Marcos Jose
    Carlucci Santana, Regina Helena
    PLOS ONE, 2015, 10 (11):
  • [34] Orlando Tools: Supporting High-performance Computing in Distributed Environments
    Gorsky, Sergey
    Kostromin, Roman
    Feoktistov, Alexander
    Bychkov, Igor
    2020 VI INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND NANOTECHNOLOGY (IEEE ITNT-2020), 2020,
  • [35] TOWARDS THE PERFORMANCE EVALUATION OF DISTRIBUTED COMPUTING SYSTEMS.
    Whitby-Strevens, C.
    1978, : 141 - 146
  • [36] Performance evaluation of communication software systems for distributed computing
    Fatoohi, R
    THIRTIETH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOL 1: SOFTWARE TECHNOLOGY AND ARCHITECTURE, 1997, : 100 - 109
  • [37] Performance evaluation of distributed computing over heterogeneous networks
    Ben Fredj, Ouissem
    Renault, Eric
    HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2007, 4782 : 53 - 61
  • [38] Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark
    Mavridis, Ilias
    Karatza, Helen
    JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 125 : 133 - 151
  • [39] Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
    Shi, Shaohuai
    Wang, Qiang
    Chu, Xiaowen
    2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 949 - 957
  • [40] Analysis of distributed computing architecture search principle based on Hadoop
    Duan, Ailing
    Cao, Dan
    Si, Haifang
    COMPUTER AND INFORMATION TECHNOLOGY, 2014, 519-520 : 54 - 57