Bi-Objective Optimization of Data-Parallel Applications on Homogeneous Multicore Clusters for Performance and Energy

被引:45
|
作者
Manumachu, Ravindranath Reddy [1 ]
Lastovetsky, Alexey [1 ]
机构
[1] Univ Coll Dublin, Sch Comp Sci, Dublin 4, Ireland
基金
爱尔兰科学基金会;
关键词
Homogeneous multicore CPU clusters; data partitioning; load balancing; performance; energy; bi-objective optimization; DVFS; MODEL; ROOFLINE; SYSTEMS; MEMORY; TIME;
D O I
10.1109/TC.2017.2742513
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Performance and energy are now the most dominant objectives for optimization on modern parallel platforms composed of multicore CPU nodes. The existing intra-node and inter-node optimization methods employ a large set of decision variables but do not consider problem size as a decision variable and assume a linear relationship between performance and problem size and between energy consumption and problem size. We demonstrate using experiments of real-life data-parallel applications on modern multicore CPUs that these relationships have complex (non-linear and even non-convex) properties and, therefore, that the problem size has become an important decision variable that can no longer be ignored. This key finding motivates our work in this paper. In this paper, we first formulate the bi-objective optimization problem for performance and energy (BOPPE) for data-parallel applications on homogeneous clusters of modern multicore CPUs. It contains only one but heretofore unconsidered decision variable, the problem size. We then present an efficient and exact global optimization algorithm called ALEPH that solves the BOPPE. It takes as inputs, discrete functions of performance and dynamic energy consumption against problem size and outputs the globally Pareto-optimal set of solutions. The solutions are the workload distributions, which achieve inter-node optimization of data-parallel applications for performance and energy. While existing solvers for BOPPE give only one solution when the problem size and number of processors are fixed, our algorithm gives a diverse set of globally Pareto-optimal solutions. The algorithm has time complexity of O(m(2) x p(2)) where m is the number of points in the discrete speed/energy function and p is the number of available processors. We experimentally study the efficiency and scalability of our algorithm for two data parallel applications, matrix multiplication and fast Fourier transform, on a modern multicore CPU and homogeneous clusters of such CPUs. Based on our experiments, we show that the average and maximum sizes of the globally Pareto-optimal sets determined by our algorithm are 15 and 34 and 7 and 20 for the two applications respectively. Comparing with load-balanced workload distribution solution, the average and maximum percentage improvements in performance and energy respectively demonstrated for the first application are (13%,97%) and (18%,71%). For the second application, these improvements are (40%,95%) and (22%, 127%). Assuming 5 percent performance degradation from the optimal is acceptable, the average and maximum improvements in energy consumption demonstrated for the two applications respectively are 9 and 44 and 8 and 20 percent. Using the algorithm and its building blocks, we also present a study of interplay between performance and energy. We demonstrate how ALEPH can be combined with DVFS-based Multi-Objective Optimization (MOP) methods to give a better set of (globally Pareto-optimal) solutions.
引用
收藏
页码:160 / 177
页数:18
相关论文
共 50 条
  • [1] Acceleration of Bi-Objective Optimization of Data-Parallel Applications for Performance and Energy on Heterogeneous Hybrid Platforms
    Manumachu, Ravi Reddy
    Khaleghzadeh, Hamidreza
    Lastovetsky, Alexey
    IEEE ACCESS, 2023, 11 : 27226 - 27245
  • [2] Bi-Objective Optimization of Data-Parallel Applications on Heterogeneous HPC Platforms for Performance and Energy Through Workload Distribution
    Khaleghzadeh, Hamidreza
    Fahad, Muhammad
    Shahid, Arsalan
    Manumachu, Ravi Reddy
    Lastovetsky, Alexey
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (03) : 543 - 560
  • [3] Multicore processor computing is not energy proportional: An opportunity for bi-objective optimization for energy and performance
    Khokhriakov, Semyon
    Manumachu, Ravi Reddy
    Lastovetsky, Alexey
    APPLIED ENERGY, 2020, 268
  • [4] New Model-Based Methods and Algorithms for Performance and Energy Optimization of Data Parallel Applications on Homogeneous Multicore Clusters
    Lastovetsky, Alexey
    Manumachu, Ravi Reddy
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (04) : 1119 - 1133
  • [5] Parallel Data Partitioning Algorithms for Optimization of Data-Parallel Applications on Modern Extreme-Scale Multicore Platforms for Performance and Energy
    Manumachu, Ravi Reddy
    Lastovetsky, Alexey
    IEEE ACCESS, 2018, 6 : 69075 - 69106
  • [6] Heterogeneous Voltage Frequency Scaling of Data-Parallel Applications for Energy Saving on Homogeneous Multicore Platforms
    Bratek, Pawel
    Szustak, Lukasz
    Wyrzykowski, Roman
    Olas, Tomasz
    Chmiel, Tomasz
    EURO-PAR 2021: PARALLEL PROCESSING WORKSHOPS, 2022, 13098 : 141 - 153
  • [7] Modeling the slowdown of data-parallel applications in homogeneous and heterogeneous clusters of workstations
    Figueira, SM
    Berman, F
    SEVENTH HETEROGENEOUS COMPUTING WORKSHOP (HCW '98), 1998, : 90 - 101
  • [8] Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors
    Carlos Saez, Juan
    Castro, Fernando
    Prieto-Matias, Manuel
    PROCEEDINGS OF THE 49TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2020, 2020,
  • [9] Bi-objective optimization of biclustering with binary data
    Hanafi, Said
    Palubeckis, Gintaras
    Glover, Fred
    INFORMATION SCIENCES, 2020, 538 : 444 - 466
  • [10] Partitioned Parallelization of MOEA/D for Bi-objective Optimization on Clusters
    Xie, Yuehong
    Ying, Weiqin
    Wu, Yu
    Wu, Bingshen
    Chen, Shiyun
    He, Weipeng
    COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, (ISICA 2015), 2016, 575 : 373 - 381