Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process

被引:11
作者
Premchaiswadi, Wichian [1 ]
Romsaiyud, Walisa [1 ]
机构
[1] Siam Univ, Grad Sch Informat Technol, Bangkok 10160, Thailand
关键词
Data handling - Fault tolerance;
D O I
10.1002/int.21563
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data-intensive applications process large volumes of data using a parallel processing method. MapReduce is a programming model designed for data-intensive applications for massive data sets and an execution framework for large-scale data processing on clusters of commodity servers. While fault tolerance, easy programming structure, and high scalability are considered strong points of MapReduce; however its configuration parameters must be fine-tuned to the specific deployment, which makes it more complex in configuration and performance. This paper explains tuning of the Hadoop configuration parameters, which directly affect MapReduce's job workflow performance under various conditions to achieve maximum performance. On the basis of the empirical data we collected, it became apparent that three main methodologies can affect the execution time of MapReduce running on cluster systems. Therefore, in this paper, we present a model that consists of three main modules: (1) Extending a data redistribution technique in order to find the high-performance nodes, (2) Utilizing the number of map/reduce slots in order to make it more efficient in terms of execution time, and (3) Developing a new hybrid routing schedule shuffle phase in order to define the scheduler task while memory management level is reduced. (C) 2012 Wiley Periodicals, Inc.
引用
收藏
页码:185 / 200
页数:16
相关论文
共 30 条
[1]  
Afrati FN, 2010, P 13 EDBT 10
[2]  
[Anonymous], 2005, Scientific Programming
[3]  
[Anonymous], HAD MAPREDUCE CHANG
[4]  
[Anonymous], 2009, Hadoop: The Definitive Guide
[5]  
Blanas S, 2010, SIGMOD 10
[6]  
BRUNIE L, 1995, BASQ INT WORKSH INF
[7]  
Buyya R., 1999, HIGH PERFORMANCE CLU, V1
[8]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[9]  
Fox G, 2002, INT J CONCURRENCY CO, V14, P371
[10]  
Ganjisaffar Y, 2011, LDMTA 11