Efficient Control Flow in Dataflow Systems: When Ease-of-Use Meets High Performance

被引:0
|
作者
Gevay, Gabor E. [1 ]
Rabl, Tilmann [2 ]
Bress, Sebastian [4 ]
Madai-Tahy, Lorand [1 ]
Quiane-Ruiz, Jorge-Arnulfo [1 ,3 ]
Markl, Volker [1 ,3 ]
机构
[1] Tech Univ Berlin, TU Berlin, Berlin, Germany
[2] Uni Potsdam, Hasso Plattner Inst, Potsdam, Germany
[3] DFKI, Berlin, Germany
[4] Snowflake Inc, Bozeman, MT USA
关键词
Iterative dataflow; Loop pipelining; Loop-invariant hoisting; ALGORITHMS; PLATFORM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern data analysis tasks often involve control flow statements, such as iterations. Common examples are PageRank and K-means. To achieve scalability, developers usually implement data analysis tasks in distributed dataflow systems, such as Spark and Flink. However, for tasks with control flow statements, these systems still either suffer from poor performance or are hard to use. For example, while Flink supports iterations and Spark provides ease-of-use, Flink is hard to use and Spark has poor performance for iterative tasks. As a result, developers typically have to implement different workarounds to run their jobs with control flow statements in an easy and efficient way. We propose Mitos, a system that achieves the best of both worlds: it achieves both high performance and ease-of-use. Mitos uses an intermediate representation that abstracts away specific control flow statements and is able to represent any imperative control flow. This facilitates building the dataflow graph and coordinating the distributed execution of control flow in a way that is not tied to specific control flow constructs. Our experimental evaluation shows that the performance of Mitos is more than one order of magnitude better than systems that launch new dataflow jobs for every iteration step. Remarkably, it is also up to 10.5 times faster than Flink, which has native iteration support, while matching the ease-of-use of Spark.
引用
收藏
页码:1428 / 1439
页数:12
相关论文
共 50 条
  • [21] Dataflow Graph Partitioning for Area-Efficient High-Level Synthesis with Systems Perspective
    Sinha, Sharad
    Srikanthan, Thambipillai
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2014, 20 (01) : 1 - 18
  • [22] Whole-genome and targeted sequencing of drug-resistant Mycobacterium tuberculosis on the iSeq100 and MiSeq: A performance, ease-of-use, and cost evaluation
    Colman, Rebecca E.
    Mace, Aurelien
    Seifert, Mania
    Hetzel, Jonathan
    Mshaiel, Haifa
    Surest, Anita
    Lemmer, Darrin
    Engelthaler, David M.
    Catanzaro, Donald G.
    Young, Amanda G.
    Denkinger, Claudia M.
    Rodwell, Timothy C.
    PLOS MEDICINE, 2019, 16 (04)
  • [23] Dataflow computing and embedded real-world-simulation for high-performance knowledge-based control of autonomous systems
    Burg, A
    SIMULATION IN INDUSTRY: 9TH EUROPEAN SIMULATION SYMPOSIUM 1997, 1997, : 170 - 174
  • [24] High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn
    Basilio B. Fraguela
    Diego Andrade
    The Journal of Supercomputing, 2021, 77 : 7676 - 7689
  • [25] Efficient flow simulation on high performance computers
    Zeiser, T
    Durst, F
    COMPUTATIONAL SCIENCE AND HIGH PERFORMANCE COMPUTING, 2005, 88 : 285 - 305
  • [26] Efficient Use of Bluetooth in Networked Control Systems
    Umirov, Ulugbek
    Jung-Il-Park
    2012 12TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2012, : 13 - 17
  • [27] When East meets West: comparing the utilization of high-performance work systems in Chinese and Irish professional service firms
    Fu, Na
    Ma, Qinhai
    Flood, Patrick C.
    Bosak, Janine
    Liu, Yang
    Zhang, Yang
    ASIA PACIFIC JOURNAL OF HUMAN RESOURCES, 2016, 54 (01) : 8 - 31
  • [28] NEW OPERATING-SYSTEMS, STANDARDS EASE CONTROL-SYSTEM DESIGN AND USE
    LAMPMAN, R
    FRANKLIN, J
    I&CS-CONTROL TECHNOLOGY FOR ENGINEERS AND ENGINEERING MANAGEMENT, 1989, 62 (02): : 41 - 47
  • [29] High-performance dataflow computing in hybrid memory systems with UPC plus plus DepSpawn
    Fraguela, Basilio B.
    Andrade, Diego
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (07): : 7676 - 7689
  • [30] DFMan: A Graph-based Optimization of Dataflow Scheduling on High-Performance Computing Systems
    Chowdhury, Fahim
    Di Natale, Francesco
    Moody, Adam
    Mohror, Kathryn
    Yu, Weikuan
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 368 - 378