MapReduce: A Flexible Data Processing Tool

被引:724
作者
Dean, Jeffrey
Ghemawat, Sanjay
机构
[1] Systems Infrastructure Group of Google, Mountain View, CA
关键词
D O I
10.1145/1629175.1629198
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
MapReduce (MR) has emerged as a flexible data processing tool for different applications. It has emerged as a programming model for processing and generating large data sets. Users specify a map function that processes a key or value pair to generate a set of intermediate key or value pairs and a reduce function that merges all intermediate values associated with the same intermediate key. MapReduce automatically parallelizes and executes the program on a large cluster of commodity machines. The runtime system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing required inter-machine communication. It also allows new programmers with parallel and distributed systems to easily utilize the resources of a large distributed system.
引用
收藏
页码:72 / 77
页数:6
相关论文
共 14 条
[1]  
ABOUZEID A, 2000, P C VER LARG DAT LYO
[2]  
*AST DAT SYST INC, IN DAT MAPREDUCE RIC
[3]  
CHANG F, 2006, P 7 S OP SYST DES IM
[4]  
DEAN J, 2004, P 6 S OP SYST DES IM
[5]  
DEWITT D, MAPREDUCE 2 BLOGPOST
[6]  
DeWitt D.J., MAPREDUCE MAJOR STEP
[7]  
Ghemawat Sanjay., 2003, SOSP'03
[8]  
*GOOGL, PROT BUFF GOOGL DATA
[9]  
*GREENPL, GREENPL MAPREDUCE BR
[10]  
*HAD, DOC OP SOURC REL