Tiled-MapReduce: Optimizing Resource Usages of Data-parallel Applications on Multicore with Tiling

被引:56
|
作者
Chen, Rong [1 ]
Chen, Haibo [1 ]
Zang, Binyu [1 ]
机构
[1] Fudan Univ, Parallel Proc Inst, Shanghai, Peoples R China
关键词
MapReduce; Tiled-MapReduce; Tiling; Multicore; CORE;
D O I
10.1145/1854273.1854337
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The prevalence of chip multiprocessor opens opportunities of running data-parallel applications originally in clusters on a single machine with many cores. Map Reduce, a simple and elegant programming model to program large scale clusters, has recently been shown to be a promising alternative to harness the multicore platform. The differences such as memory hierarchy and communication patterns between clusters and multicore platforms raise new challenges to design and implement an efficient Map Reduce system on multicore. This paper argues that it is more efficient for MapReduce to iteratively process small chunks of data in turn than processing a large chunk of data at one time on shared memory multicore platforms. Based on the argument, we extend the general Map Reduce programming model with "tiling strategy", called Tiled-MapReduce (TMR). TMR partitions a large Map Reduce job into a number of small sub-jobs and iteratively processes one sub-job at a time with efficient use of resources; TMR finally merges the results of all sub-jobs for output. Based on Tiled-MapReduce, we design and implement several optimizing techniques targeting multicore, including the reuse of input and intermediate data structure among sub-jobs, a NUCA/NUMA-aware scheduler, and pipelining a sub-job's reduce phase with the successive sub-job's map phase, to optimize the memory, cache and CPU resources accordingly. We have implemented a prototype of Tiled-MapReduce based on Phoenix, an already highly optimized Map Reduce runtime for shared memory multiprocessors. The prototype, namely Ostrich, runs on an Intel machine with 16 cores. Experiments on four different types of benchmarks show that Ostrich saves up to 85% memory, causes less cache misses and makes more efficient uses of CPU cores, resulting in a speedup ranging from 1.2X to 3.3X.
引用
收藏
页码:523 / 534
页数:12
相关论文
共 50 条
  • [1] Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
    Chen, Rong
    Chen, Haibo
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 10 (01)
  • [2] Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors
    Carlos Saez, Juan
    Castro, Fernando
    Prieto-Matias, Manuel
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [3] Optimizing Resource Allocation for Data-Parallel Jobs Via GCN-Based Prediction
    Hu, Zhiyao
    Li, Dongsheng
    Zhang, Dongxiang
    Zhang, Yiming
    Peng, Baoyun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (09) : 2188 - 2201
  • [4] Bi-Objective Optimization of Data-Parallel Applications on Homogeneous Multicore Clusters for Performance and Energy
    Manumachu, Ravindranath Reddy
    Lastovetsky, Alexey
    IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (02) : 160 - 177
  • [5] A genetic algorithm for scheduling of data-parallel tasks on multicore architectures
    Liu Y.
    Meng L.
    Tomiyama H.
    IPSJ Transactions on System LSI Design Methodology, 2019, 12 : 74
  • [6] Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneous Multicore Platforms
    Bratek, Pawel
    Szustak, Lukasz
    Wyrzykowski, Roman
    Olas, Tomasz
    Chmiel, Tomasz
    EURO-PAR 2021: PARALLEL PROCESSING WORKSHOPS, 2022, 13098 : 141 - 153
  • [7] Parallel Data Partitioning Algorithms for Optimization of Data-Parallel Applications on Modern Extreme-Scale Multicore Platforms for Performance and Energy
    Manumachu, Ravi Reddy
    Lastovetsky, Alexey
    IEEE ACCESS, 2018, 6 : 69075 - 69106
  • [8] A design methodology for data-parallel applications
    Nyland, LS
    Prins, JF
    Goldberg, A
    Mills, PH
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2000, 26 (04) : 293 - 314
  • [9] Resource-Aware Device Allocation of Data-Parallel Applications on Heterogeneous Systems
    Kim, Donghyeon
    Kang, Seokwon
    Lim, Junsu
    Jung, Sunwook
    Kim, Woosung
    Park, Yongjun
    ELECTRONICS, 2020, 9 (11) : 1 - 18
  • [10] Reducing energy consumption using heterogeneous voltage frequency scaling of data-parallel applications for multicore systems
    Bratek, Pawel
    Szustak, Lukasz
    Wyrzykowski, Roman
    Olas, Tomasz
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2023, 175 : 121 - 133