Tiled-MapReduce: Optimizing Resource Usages of Data-parallel Applications on Multicore with Tiling

被引:56
|
作者
Chen, Rong [1 ]
Chen, Haibo [1 ]
Zang, Binyu [1 ]
机构
[1] Fudan Univ, Parallel Proc Inst, Shanghai, Peoples R China
关键词
MapReduce; Tiled-MapReduce; Tiling; Multicore; CORE;
D O I
10.1145/1854273.1854337
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The prevalence of chip multiprocessor opens opportunities of running data-parallel applications originally in clusters on a single machine with many cores. Map Reduce, a simple and elegant programming model to program large scale clusters, has recently been shown to be a promising alternative to harness the multicore platform. The differences such as memory hierarchy and communication patterns between clusters and multicore platforms raise new challenges to design and implement an efficient Map Reduce system on multicore. This paper argues that it is more efficient for MapReduce to iteratively process small chunks of data in turn than processing a large chunk of data at one time on shared memory multicore platforms. Based on the argument, we extend the general Map Reduce programming model with "tiling strategy", called Tiled-MapReduce (TMR). TMR partitions a large Map Reduce job into a number of small sub-jobs and iteratively processes one sub-job at a time with efficient use of resources; TMR finally merges the results of all sub-jobs for output. Based on Tiled-MapReduce, we design and implement several optimizing techniques targeting multicore, including the reuse of input and intermediate data structure among sub-jobs, a NUCA/NUMA-aware scheduler, and pipelining a sub-job's reduce phase with the successive sub-job's map phase, to optimize the memory, cache and CPU resources accordingly. We have implemented a prototype of Tiled-MapReduce based on Phoenix, an already highly optimized Map Reduce runtime for shared memory multiprocessors. The prototype, namely Ostrich, runs on an Intel machine with 16 cores. Experiments on four different types of benchmarks show that Ostrich saves up to 85% memory, causes less cache misses and makes more efficient uses of CPU cores, resulting in a speedup ranging from 1.2X to 3.3X.
引用
收藏
页码:523 / 534
页数:12
相关论文
共 50 条
  • [31] Energy efficiency of load balancing for data-parallel applications in heterogeneous systems
    Borja Pérez
    Esteban Stafford
    José Luis Bosque
    Ramón Beivide
    The Journal of Supercomputing, 2017, 73 : 330 - 342
  • [32] An evaluation of data-parallel compiler support for line-sweep applications
    Chavarría-Miranda, D
    Mellor-Crummey, J
    2002 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2002, : 7 - 17
  • [33] Energy efficiency of load balancing for data-parallel applications in heterogeneous systems
    Perez, Borja
    Stafford, Esteban
    Luis Bosque, Jose
    Beivide, Ramon
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (01): : 330 - 342
  • [34] Topology-based meshes for local communication in data-parallel applications
    Figueira, Silvia
    Beeby, Steven
    Wu, Annie Shuyan
    2005 IEEE International Conference on Cluster Computing (CLUSTER), 2006, : 554 - 562
  • [35] FuPerMod: a software tool for the optimization of data-parallel applications on heterogeneous platforms
    David Clarke
    Ziming Zhong
    Vladimir Rychkov
    Alexey Lastovetsky
    The Journal of Supercomputing, 2014, 69 : 61 - 69
  • [36] Modeling the slowdown of data-parallel applications in homogeneous and heterogeneous clusters of workstations
    Figueira, SM
    Berman, F
    SEVENTH HETEROGENEOUS COMPUTING WORKSHOP (HCW '98), 1998, : 90 - 101
  • [37] Speedup Analysis of Data-parallel Applications on Multi-core NoCs
    Chen, Xiaowen
    Lu, Zhonghai
    Jantsch, Axel
    Chen, Shuming
    2009 IEEE 8TH INTERNATIONAL CONFERENCE ON ASIC, VOLS 1 AND 2, PROCEEDINGS, 2009, : 105 - +
  • [38] Using preemptive thread migration to load-balance data-parallel applications
    Antoniu, G
    Perez, C
    EURO-PAR'99: PARALLEL PROCESSING, 1999, 1685 : 117 - 124
  • [39] Object-oriented run-time support for data-parallel applications
    Bi, H
    Kessler, M
    Wilhelmi, M
    COMPUTING IN OBJECT-ORIENTED PARALLEL ENVIRONMENTS, 1998, 1505 : 175 - 182
  • [40] Runtime support for parallelization of data-parallel applications on adaptive and nonuniform computational environments
    Kaddoura, M
    Ranka, S
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1997, 43 (02) : 163 - 168