iMapReduce: A Distributed Computing Framework for Iterative Computation

被引:95
作者
Zhang, Yanfeng [1 ,3 ]
Gao, Qixin [2 ]
Gao, Lixin [3 ]
Wang, Cuirong [2 ]
机构
[1] Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110819, Liaoning, Peoples R China
[2] NE Univ Qinhuangdao, Dept Elect & Informat Engn, Qinhuangdao 066000, Hebei, Peoples R China
[3] Univ Massachusetts Amherst, Dept Elect & Comp Engn, Amherst, MA 01002 USA
基金
美国国家科学基金会;
关键词
Iterative computation; iMapReduce; Distributed computing framework; Hadoop; LINK-PREDICTION;
D O I
10.1007/s10723-012-9204-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Iterative computation is pervasive in many applications such as data mining, web ranking, graph analysis, online social network analysis, and so on. These iterative applications typically involve massive data sets containing millions or billions of data records. This poses demand of distributed computing frameworks for processing massive data sets on a cluster of machines. MapReduce is an example of such a framework. However, MapReduce lacks built-in support for iterative process that requires to parse data sets iteratively. Besides specifying MapReduce jobs, users have to write a driver program that submits a series of jobs and performs convergence testing at the client. This paper presents iMapReduce, a distributed framework that supports iterative processing. iMapReduce allows users to specify the iterative computation with the separated map and reduce functions, and provides the support of automatic iterative processing within a single job. More importantly, iMapReduce significantly improves the performance of iterative implementations by (1) reducing the overhead of creating new MapReduce jobs repeatedly, (2) eliminating the shuffling of static data, and (3) allowing asynchronous execution of map tasks. We implement an iMapReduce prototype based on Apache Hadoop, and show that iMapReduce can achieve up to 5 times speedup over Hadoop for implementing iterative algorithms.
引用
收藏
页码:47 / 68
页数:22
相关论文
共 34 条
[1]  
[Anonymous], 2006, NIPS
[2]  
[Anonymous], 2008, Proceeding of the 17th international Conference on World Wide Web, DOI [DOI 10.1145/1367497.1367618, 10.1145/1367497.1367618]
[3]  
[Anonymous], 2008, P 17 INT C WORLD WID, DOI DOI 10.1145/1367497.1367591
[4]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[5]  
Bronshtein I.N., 2013, HDB MATH
[6]  
Bu YY, 2010, PROC VLDB ENDOW, V3, P285
[7]  
Chakrabarti S., 2007, P INT C WORLD WID WE, P571, DOI [10.1145/ 1242572.1242650, DOI 10.1145/1242572.1242650]
[8]   Power-Law Distributions in Empirical Data [J].
Clauset, Aaron ;
Shalizi, Cosma Rohilla ;
Newman, M. E. J. .
SIAM REVIEW, 2009, 51 (04) :661-703
[9]  
Condie N., 2010, Proceedings of the 7th USENIX conference on Networked systems design and implementation, NSDI'10
[10]  
Cormen T., 2001, Introduction to Algorithms