Exoshuffle: An Extensible Shuffle Architecture

被引:2
|
作者
Luan, Frank Sifei [1 ]
Wang, Stephanie [1 ,2 ]
Yagati, Samyukta [1 ]
Kim, Sean [1 ]
Lien, Kenneth [1 ]
Ong, Isaac [1 ]
Hong, Tony [1 ]
Cho, SangBin [3 ]
Liang, Eric [3 ]
Stoica, Ion [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Anyscale, Berkeley, CA USA
[3] Anyscale, San Francisco, CA USA
关键词
Shuffle; MapReduce; distributed computing; extensibility;
D O I
10.1145/3603269.3604848
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithic shuffle systems. These systems are costly to develop, and they are tightly integrated with batch processing frameworks that offer only high-level APIs such as SQL. New applications, such as ML training, require more flexibility and finer-grained interoperability with shuffle. They are often unable to leverage existing shuffle optimizations. We propose an extensible shuffle architecture. We present Exoshuffle, a library for distributed shuffle that offers competitive performance and scalability as well as greater flexibility than monolithic shuffle systems. We design an architecture that decouples the shuffle control plane from the data plane without sacrificing performance. We build Exoshuffle on Ray, a distributed futures system for data and ML applications, and demonstrate that we can: (1) rewrite previous shuffle optimizations as application-level libraries with an order of magnitude less code, (2) achieve shuffle performance and scalability competitive with monolithic shuffle systems, and break the CloudSort record as the world's most cost-efficient sorting system, and (3) enable new applications such as ML training to easily leverage scalable shuffle.
引用
收藏
页码:564 / 577
页数:14
相关论文
共 50 条
  • [1] EXTENSIBLE PACKAGED ARCHITECTURE
    HIRAKAWA, Y
    KAWASHIMA, N
    IMASE, M
    MICROPROCESSING AND MICROPROGRAMMING, 1988, 22 (01): : 65 - 74
  • [2] A Scalable and Extensible Blockchain Architecture
    Yu, Yue
    Liang, Ran
    Xu, Jiqiu
    2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 161 - 163
  • [3] VERA: an extensible router architecture
    Karlin, S
    Peterson, L
    COMPUTER NETWORKS, 2002, 38 (03) : 277 - 293
  • [4] Towards an Extensible Architecture for Ideation
    do Prado, Hercules A.
    Marcial, Elaine Coutinho
    Haendchen Filho, Aluizio
    Ferneda, Edilson
    Salvio, Roseane
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES 2019), 2019, 159 : 727 - 735
  • [5] An architecture for extensible middleware platforms
    Bruneton, E
    Riveill, M
    SOFTWARE-PRACTICE & EXPERIENCE, 2001, 31 (13): : 1237 - 1264
  • [6] Towards an Extensible WebLab Architecture
    Garcia-Zubia, J.
    Orduna, P.
    Irurzun, J.
    Hernandez, U.
    Sancristobal, E.
    Martin, S.
    Castro, M.
    Lopez-de-Ipina, D.
    Angulo, I.
    2009 3RD IEEE INTERNATIONAL CONFERENCE ON E-LEARNING IN INDUSTRIAL ELECTRONICS (ICELIE 2009), 2009, : 109 - +
  • [7] VERA: An extensible router architecture
    Karlin, S
    Peterson, L
    2001 IEEE OPEN ARCHITECTURES AND NETWORK PROGRAMMING PROCEEDINGS, 2001, : 3 - 14
  • [8] An extensible XML mapping architecture
    Jiyi, Wu
    PROCEEDINGS OF THE 26TH CHINESE CONTROL CONFERENCE, VOL 6, 2007, : 291 - 293
  • [9] An Extensible and Affordable Exploration Architecture
    Duggan, Matthew
    Engle, James
    Moseman, Travis
    2017 IEEE AEROSPACE CONFERENCE, 2017,
  • [10] An extensible software architecture for mobile components
    Johansen, D
    Lauvset, KJ
    Marzullo, K
    NINTH ANNUAL IEEE INTERNATIONAL CONFERENCE AND WORKSHOP ON THE ENGINEERING OF COMPUTER-BASED SYSTEMS, PROCEEDINGS, 2002, : 231 - 237