Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations

被引:0
|
作者
Yu, Yuan [1 ]
Gunda, Pradeep Kumar [1 ]
Isard, Michael [1 ]
机构
[1] Microsoft Res, Mountain View, CA 94043 USA
关键词
Distributed programming; cloud computing; concurrency;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data-intensive applications are increasingly designed to execute on large computing clusters. Grouped aggregation is a core primitive of many distributed programming models, and it is often the most efficient available mechanism for computations such as matrix multiplication and graph traversal. Such algorithms typically require non-standard aggregations that are more sophisticated than traditional built-in database functions such as Sum and Max. As a result, the ease of programming user-defined aggregations, and the efficiency of their implementation, is of great current interest. This paper evaluates the interfaces and implementations for user-defined aggregation in several state of the art distributed computing systems: Hadoop, databases such as Oracle Parallel Server, and DryadLINQ. We show that: the degree of language integration between user-defined functions and the high-level query language has an impact on code legibility and simplicity; the choice of programming interface has a material effect on the performance of computations; some execution plans perform better than others on average; and that in order to get good performance on a variety of workloads a system must be able to select between execution plans depending on the computation. The interface and execution plan described in the Map Reduce paper, and implemented by Hadoop, are found to be among the worst-performing choices.
引用
收藏
页码:247 / 260
页数:14
相关论文
共 50 条
  • [21] Dynamic evaluation strategy for fine-grain data-parallel computing
    Muchnick, VB
    Shafarenko, AV
    IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES, 1996, 143 (03): : 181 - 188
  • [22] Compiling data-parallel programs to a distributed runtime environment with thread isomigration
    Antoniu, G
    Bougé, L
    Namyst, R
    Perez, C
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, PROCEEDINGS, 1999, : 1756 - 1762
  • [23] A Generic Cost Model for Concurrent and Data-parallel Meta-computing
    Merlin, Armelle
    Hains, Gaetan
    ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2005, 128 (06) : 3 - 19
  • [24] LOAD BALANCING DATA-PARALLEL PROGRAMS ON DISTRIBUTED-MEMORY COMPUTERS
    DEKEYSER, J
    ROOSE, D
    PARALLEL COMPUTING, 1993, 19 (11) : 1199 - 1219
  • [25] A Framework for Distributed Data-Parallel Execution in the Kepler Scientific Workflow System
    Wang, Jianwu
    Crawl, Daniel
    Altintas, Ilkay
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012, 2012, 9 : 1620 - 1629
  • [26] Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview
    Thamsen, Lauritz
    Scheinert, Dominik
    Will, Jonathan
    Bader, Jonathan
    Kao, Odej
    Datenbank-Spektrum, 2022, 22 (02) : 143 - 151
  • [27] Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers
    Bouvier, Thomas
    Nicolae, Bogdan
    Chaugier, Hugo
    Costan, Alexandru
    Foster, Ian
    Antoniu, Gabriel
    2024 IEEE 24TH INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING, CCGRID 2024, 2024, : 245 - 254
  • [28] AdaComp: Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
    Chen, Chia-Yu
    Choi, Jungwook
    Brand, Daniel
    Agrawal, Ankur
    Zhang, Wei
    Gopalakrishnan, Kailash
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2827 - 2835
  • [29] DAME: An environment for preserving the efficiency of data-parallel computations on distributed systems
    Colajanni, M
    Cermele, M
    IEEE CONCURRENCY, 1997, 5 (01): : 41 - &
  • [30] Translation of Array-Based Loops to Distributed Data-Parallel Programs
    Fegaras, Leonidas
    Noor, Md Hasanuzzaman
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (08): : 1248 - 1260