Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

被引:0
|
作者
Taran, Vladyslav [1 ]
Alienin, Oleg [1 ]
Stirenko, Sergii [1 ]
Gordienko, Yuri [1 ]
Rojbi, A. [2 ]
机构
[1] Natl Tech Univ Ukraine, Igor Sikorsky Kyiv Polytech Inst, Kiev, Ukraine
[2] Univ Paris 08, CHArt Lab, Human & Artificial Cognit, 2 Rue Liberte, F-93526 St Denis, France
关键词
information systems; Big Data; distributed computing; clusters; Hadoop; Spark; speedup; machine learning; multimodal interactions; data image processing and recognition;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing the computations among a number of compute nodes. In this work, performance of distributed computing environments on the basis of Hadoop and Spark frameworks is estimated for real and virtual versions of clusters. As a test task, we chose the classic use case of word counting in texts of various sizes. It was found that the running times grow very fast with the dataset size and faster than a power function even. As to the real and virtual versions of cluster implementations, this tendency is the similar for both Hadoop and Spark frameworks. Moreover, speedup values decrease significantly with the growth of dataset size, especially for virtual version of cluster configuration. The problem of growing data generated by IoT and multimodal (visual, sound, tactile, neuro and brain-computing, muscle and eye tracking, etc.) interaction channels is presented. In the context of this problem, the current observations as to the running times and speedup on Hadoop and Spark frameworks in real and virtual cluster configurations can be very useful for the proper scaling-up and efficient job management, especially for machine learning and Deep Learning applications, where Big Data are widely present.
引用
收藏
页码:80 / 83
页数:4
相关论文
共 50 条
  • [41] Source camera identification: a distributed computing approach using Hadoop
    Faiz, Muhammad
    Anuar, Nor Badrul
    Wahab, Ainuddin Wahid Abdul
    Shamshirband, Shahaboddin
    Chronopoulos, Anthony T.
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2017, 6
  • [42] Research on parallel algorithm based on hadoop distributed computing platform
    Heilongjiang University of Technology, Jixi, China
    Int. J. Grid Distrib. Comput., 4 (163-170):
  • [43] Source camera identification: a distributed computing approach using Hadoop
    Muhammad Faiz
    Nor Badrul Anuar
    Ainuddin Wahid Abdul Wahab
    Shahaboddin Shamshirband
    Anthony T. Chronopoulos
    Journal of Cloud Computing, 6
  • [44] Hadoop Extensions for Distributed Computing on Reconfigurable Active SSD Clusters
    Kaitoua, Abdulrahman
    Hajj, Hazem
    Saghir, Mazen A. R.
    Artail, Hassan
    Akkary, Haitham
    Awad, Mariette
    Sharafeddine, Mageda
    Mershad, Khaleel
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2014, 11 (02) : 191 - 216
  • [45] Strategies for distributed parallel computing on grid computing environments
    Lin, Weiwei
    Zhang, Zhili
    Qi, Deyu
    Jisuanji Gongcheng/Computer Engineering, 2006, 32 (09): : 104 - 106
  • [46] Performance Evaluation of Data Mining Frameworks in Hadoop Cluster Using Virtual Campus Log Files
    Xhafa, Fatos
    Ramirez, Daniel
    Garcia, Daniel
    Caballe, Santi
    2015 INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS IEEE INCOS 2015, 2015, : 217 - 222
  • [47] The Performance Evaluation of K-means by Two MapReduce Frameworks, Hadoop vs. Twister
    Kang, Yunhee
    Park, Young B.
    2015 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN), 2015, : 405 - 406
  • [48] High Performance Hadoop Distributed File System
    Elkawkagy, Mohamed
    Elbeh, Heba
    INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2020, 8 (03) : 119 - 123
  • [49] The Hadoop Distributed Filesystem: Balancing Portability and Performance
    Shafer, Jeffrey
    Rixner, Scott
    Cox, Alan L.
    2010 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2010), 2010, : 122 - 133
  • [50] Simulation in parallel and distributed computing environments
    Zomaya, AY
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 1998, 13 (01): : 3 - 4