Exoshuffle: An Extensible Shuffle Architecture

被引:2
|
作者
Luan, Frank Sifei [1 ]
Wang, Stephanie [1 ,2 ]
Yagati, Samyukta [1 ]
Kim, Sean [1 ]
Lien, Kenneth [1 ]
Ong, Isaac [1 ]
Hong, Tony [1 ]
Cho, SangBin [3 ]
Liang, Eric [3 ]
Stoica, Ion [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Anyscale, Berkeley, CA USA
[3] Anyscale, San Francisco, CA USA
关键词
Shuffle; MapReduce; distributed computing; extensibility;
D O I
10.1145/3603269.3604848
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithic shuffle systems. These systems are costly to develop, and they are tightly integrated with batch processing frameworks that offer only high-level APIs such as SQL. New applications, such as ML training, require more flexibility and finer-grained interoperability with shuffle. They are often unable to leverage existing shuffle optimizations. We propose an extensible shuffle architecture. We present Exoshuffle, a library for distributed shuffle that offers competitive performance and scalability as well as greater flexibility than monolithic shuffle systems. We design an architecture that decouples the shuffle control plane from the data plane without sacrificing performance. We build Exoshuffle on Ray, a distributed futures system for data and ML applications, and demonstrate that we can: (1) rewrite previous shuffle optimizations as application-level libraries with an order of magnitude less code, (2) achieve shuffle performance and scalability competitive with monolithic shuffle systems, and break the CloudSort record as the world's most cost-efficient sorting system, and (3) enable new applications such as ML training to easily leverage scalable shuffle.
引用
收藏
页码:564 / 577
页数:14
相关论文
共 50 条
  • [41] AN EXTENSIBLE FAULT-TOLERANT NETWORK ARCHITECTURE
    GHOSE, S
    SINHA, BP
    DATTAGUPTA, J
    COMPUTERS & ELECTRICAL ENGINEERING, 1993, 19 (05) : 365 - 376
  • [42] An Extensible Deep Architecture for Action Recognition Problem
    Sanou, Isaac
    Conte, Donatello
    Cardot, Hubert
    PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 191 - 199
  • [43] Towards an Extensible Architecture for Refactoring Test Code
    Marinke, Rogerio
    Guerra, Eduardo Martins
    Silveira, Fabio Fagundes
    Azevedo, Rafael Monico
    Nascimento, Wagner
    de Almeida, Rodrigo Simoes
    Demboscki, Bruno Rodrigues
    da Silva, Tiago Silva
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2019, PT IV, 2019, 11622 : 456 - 471
  • [44] An Extensible Networking Architecture for Autonomous Underwater Vehicles
    Martins, Ricardo
    de Sousa, Joao Borges
    2013 OCEANS - SAN DIEGO, 2013,
  • [45] Open Wonderland: An Extensible Virtual World Architecture
    Kaplan, Jonathan
    Yankelovich, Nicole
    IEEE INTERNET COMPUTING, 2011, 15 (05) : 38 - 45
  • [46] Hybrid and Extensible Architecture for Cloud Infrastructure Deployment
    Chirivella-Perez, Enrique
    Gutierrez-Aguado, Juan
    Claver, Jose M.
    Calero, Jose M. Alcaraz
    CIT/IUCC/DASC/PICOM 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - UBIQUITOUS COMPUTING AND COMMUNICATIONS - DEPENDABLE, AUTONOMIC AND SECURE COMPUTING - PERVASIVE INTELLIGENCE AND COMPUTING, 2015, : 611 - 617
  • [47] Supporting heterogeneous architecture descriptions in an extensible toolset
    Leclercq, Matthieu
    Ozcan, Ali Erdem
    Quema, Vivien
    Stefani, Jean-Bernard
    ICSE 2007: 29TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2007, : 209 - +
  • [48] A user-extensible and adaptable parser architecture
    Tobin, John
    Vogel, Carl
    KNOWLEDGE-BASED SYSTEMS, 2009, 22 (07) : 516 - 522
  • [49] ScELA: Scalable and Extensible Launching Architecture for Clusters
    Sridhar, Jaidev K.
    Koop, Matthew J.
    Perkins, Jonathan L.
    Panda, Dhabaleswar K.
    High Performance Computing - HiPC 2008, Proceedings, 2008, 5374 : 323 - 335