IDaPS - Improved data-locality aware data placement strategy based on Markov clustering to enhance MapReduce performance on Hadoop

被引:3
|
作者
Vengadeswaran, S. [1 ]
Balasundaram, S. R. [2 ]
Dhavakumar, P. [3 ]
机构
[1] Indian Inst Informat Technol Kottayam, Dept Comp Sci & Engn, Pala 686635, Kerala, India
[2] Natl Inst Technol, Dept Comp Applicat, Tiruchirappalli 620015, Tamil Nadu, India
[3] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai 600127, Tamil Nadu, India
关键词
Big data; HDFS; MapReduce; Data placement; Intra-dependency; Cloud;
D O I
10.1016/j.jksuci.2024.101973
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The execution of Map-Reduce applications on the Hadoop cluster poses significant challenges due non-consideration of data locality, i.e., assigning tasks to compute nodes where input data sets are Due to such non-consideration, high data transfer overheads are caused. Further, it increases latency, may arise if input data needs to be transferred across the network, thereby significantly increasing execution time. To address this issue, an Improved DAta Placement Strategy IDaPS based on the intra-dependency among the data is proposed. IDaPS re-organizes the default data layouts in HDFS to ensure higher of parallelism. The efficiency of IDaPS is demonstrated in Hadoop clusters (10 and 15 nodes) by executing Hadoop Benchmark performance tests viz. WordCount, Grep on Project-Gutenberg book dataset (50 Least Square Linear Regression (LSLR) on weather dataset (10.67 GB). The results were compared with of-the-art algorithms viz. Hadoop Default Data Placement (HDDP), Load-Balancer and literary work The results demonstrate that IDaPS significantly reduces execution time by 28.2% and 38.4% in 10-node 15-node clusters while executing WordCount, and 35% and 38.1% in 10-node and 15-node clusters for Similarly, for LSLR, it reduces execution time by 32.7%.
引用
收藏
页数:10
相关论文
共 47 条
  • [1] An improved data placement strategy for hadoop
    Lin, Wei-Wei
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2012, 40 (01): : 152 - 158
  • [2] Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment
    Jeyaraj, Rathinaraja
    Ananthanarayana, V. S.
    Paul, Anand
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (10) : 4261 - 4272
  • [3] Fine-grained data-locality aware MapReduce job scheduler in a virtualized environment
    Rathinaraja Jeyaraj
    V. S. Ananthanarayana
    Anand Paul
    Journal of Ambient Intelligence and Humanized Computing, 2020, 11 : 4261 - 4272
  • [4] Improved CURE Clustering for Big Data using Hadoop and Mapreduce
    Lathiya, Piyush
    Rani, Rinkle
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 3, 2015, : 241 - 245
  • [5] A data locality based scheduler to enhance MapReduce performance in heterogeneous environments
    Naik, Nenavath Srinivas
    Negi, Atul
    Bapu, Tapas B. R.
    Anitha, R.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 90 : 423 - 434
  • [6] SIGNIFICANCE OF HIERARCHICAL AND MARKOV CLUSTERING IN GROUPING-AWARE DATA PLACEMENT FOR DATA INTENSIVE APPLICATIONS WITH INTEREST LOCALITY
    Vengadeswaran, Shanmugasundaram
    Balasundaram, Sadhu Ramakrishnan
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2018, 19 (03): : 245 - 257
  • [7] An Approach to Enhance the Performance of Hadoop MapReduce Framework for Big Data
    Chandra, Subhash
    Motwani, Deepak
    2016 INTERNATIONAL CONFERENCE ON MICRO-ELECTRONICS AND TELECOMMUNICATION ENGINEERING (ICMETE), 2016, : 178 - 182
  • [8] An improved data placement strategy in a heterogeneous Hadoop cluster
    Zhao, Wentao
    Meng, Lingjun
    Sun, Jiangfeng
    Ding, Yang
    Zhao, Haohao
    Wang, Lina
    Open Cybernetics and Systemics Journal, 2014, 8 (01): : 957 - 963
  • [9] An Improved data placement strategy in a heterogeneous hadoop cluster
    Zhao, Wentao
    Meng, Lingjun
    Sun, Jiangfeng
    Ding, Yang
    Zhao, Haohao
    Wang, Lina
    Open Cybernetics and Systemics Journal, 2015, 9 (01): : 792 - 798
  • [10] Performance analysis and optimality results for data-locality aware tasks scheduling with replicated inputs
    Beaumont, Olivier
    Lambert, Thomas
    Marchal, Loris
    Thomas, Bastien
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 111 : 582 - 598