An Enhanced Data-Locality-Aware Task Scheduling Algorithm for Hadoop Applications

被引:13
|
作者
Choi, Dongjoo [1 ]
Jeon, Myunghoon [1 ]
Kim, Namgi [1 ]
Lee, Byoung-Dai [1 ]
机构
[1] Kyonggi Univ, Comp Sci Dept, Suwon 443760, South Korea
来源
IEEE SYSTEMS JOURNAL | 2018年 / 12卷 / 04期
基金
新加坡国家研究基金会;
关键词
Data locality; Hadoop distributed file system (HDFS); MapReduce; task scheduling;
D O I
10.1109/JSYST.2017.2764481
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In general, Hadoop improves the task scheduling performance by determining data locality based on the location in which the input splits and MapTask are executed. However, if an input split consists of multiple data blocks that are distributed and stored in different nodes, this data location method fails to cope with the degradation in processing performance due to the increased frequency of data block copying. We propose a task scheduling algorithm that solves this issue by defining a method to classify data locality taking into account the location of all data blocks that comprise an input split, categorizing tasks based on the defined method, and sequentially assigning tasks according to a given priority. This study measures the performance of the proposed algorithm through a comparison of the total processing time, MapTask performance time, and data block copying frequency between the proposed algorithm and Hadoop's default task scheduling algorithm. The test results show that the proposed algorithm improved the total processing time by up to 25% and the data block copying frequency by up to 28%, when compared to the default algorithm.
引用
收藏
页码:3346 / 3357
页数:12
相关论文
共 50 条
  • [21] Data Locality Aware Algorithm for Task Execution on Distributed, Cloud Based Environments
    Bica, Mihai
    Gorgan, Dorian
    COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS, CISIS-2017, 2018, 611 : 557 - 566
  • [22] Load balancing task scheduling algorithm in Hadoop platform
    Cai Yandong
    Liu Yan
    Zhang Qinglei
    2015 SEVENTH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2015), 2015, : 605 - 608
  • [23] Locality-Aware CTA Scheduling for Gaming Applications
    Ukarande, Aditya
    Patidar, Suryakant
    Rangan, Ram
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2022, 19 (01)
  • [24] An Energy-aware Task Scheduling Algorithm for a Heterogeneous Data Center
    Zhang, Shuo
    Wang, Baosheng
    Zhao, Baokang
    Tao, Jing
    2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 1471 - 1477
  • [25] Resource Scheduling and Data Locality for Virtualized Hadoop on IaaS Cloud Platform
    Tao, Dan
    Wang, Bingxu
    Lin, Zhaowen
    Wu, Tin-Yu
    BIG DATA COMPUTING AND COMMUNICATIONS, (BIGCOM 2016), 2016, 9784 : 332 - 341
  • [26] Profit-oriented task scheduling algorithm in Hadoop cluster
    Chai, Xu-qing
    Dong, Yong-liang
    Li, Jun-fei
    EURASIP JOURNAL ON EMBEDDED SYSTEMS, 2016,
  • [27] Locality-aware task scheduling for homogeneous parallel computing systems
    Muhammad Khurram Bhatti
    Isil Oz
    Sarah Amin
    Maria Mushtaq
    Umer Farooq
    Konstantin Popov
    Mats Brorsson
    Computing, 2018, 100 : 557 - 595
  • [28] Locality-aware task scheduling for homogeneous parallel computing systems
    Bhatti, Muhammad Khurram
    Oz, Isil
    Amin, Sarah
    Mushtaq, Maria
    Farooq, Umer
    Popov, Konstantin
    Brorsson, Mats
    COMPUTING, 2018, 100 (06) : 557 - 595
  • [29] Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors
    Muddukrishna, Ananya
    Jonsson, Peter A.
    Brorsson, Mats
    SCIENTIFIC PROGRAMMING, 2015, 2015
  • [30] Leveraging Data-Flow Task Parallelism for Locality-Aware Dynamic Scheduling on Heterogeneous Platforms
    Simsek, Osman Seckin
    Drebes, Andi
    Pop, Antoniu
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 540 - 549