Interminable Flows: A Generic, Joint, Customizable Resiliency Model for Big-Data Streaming Platforms

被引:0
|
作者
Abusalah, Bara [1 ]
Qadah, Thamir M. [2 ]
Stephen, Julian James [3 ]
Eugster, Patrick [1 ,4 ]
机构
[1] Purdue Univ, Elect & Comp Engn Dept, W Lafayette, IN 47907 USA
[2] Umm Al Qura Univ, Comp Syst Dept, Mecca 24382, Saudi Arabia
[3] IBM Watson Res Ctr, Yorktown Hts, NY 10598 USA
[4] Univ Svizzera Italiana, Comp Syst Inst, CH-6900 Lugano, Switzerland
关键词
Reliability; Big Data; Checkpointing; Task analysis; Fault tolerant systems; Resource management; Batch production systems; Fault tolerance; reliability; replication; resource management systems (RMS); streaming frameworks;
D O I
10.1109/ACCESS.2023.3239365
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The examiner of cloud computing systems in the last few years observes that there is a trend of the emergence of a new Big Data framework almost every year. Since Hadoop was developed in 2007, new frameworks followed it such as Spark, Storm, Heron, Apex, Flink, Samza, Kafka...etc. Each framework is developed in a certain way to target and achieve certain objectives better than other frameworks do. However, there are few common functionalities and aspects that are shared between these frameworks. One vital aspect all these frameworks strive to achieve is better reliability and faster recovery time in case of failures. This is particularly crucial for streaming systems (compared to batch processing systems) where events are processed and monitored online in real time, and any delay in data delivery will cause a major inconvenience to the users. Another observation is that some reliability implementations are redundant between different frameworks. Encapsulating these implementations into one layer and make it shared between different applications will benefit more than one framework without the burden of re-implementing the same reliability approach in each single framework. These observations motivated us to present Warden, a generic, multi-framework, flexible, customizable, low overhead protocol to ensure the resiliency of streaming applications running on streaming Big Data frameworks. Most reliability protocols carry out one rigid fault tolerance approach targeted towards one system at a time. It is more challenging to provide a reliability approach that is pluggable in multiple Big Data frameworks at a time and can achieve low overheads comparable with single targeted framework approaches, yet is flexible and customizable by its users to make it tailored towards their objectives. The genericity is attained by providing an interface that can be used in different applications from different frameworks. The low overhead is achieved by providing faster application finish times with and without failures. The customizability is fulfilled by providing the users the options to choose between two delivery semantics (Exactly Once / At Most Once) combined with two fault tolerance guarantees (Crash Failures / Byzantine Failures). To the best of our knowledge, such approach was never tried on multiple streaming frameworks before. We built a prototype of Warden on Flink and Samza (with Kafka) streaming frameworks. Our evaluations highlight the effectiveness of our approach in the presence of failures and without failures compared to other fault tolerance techniques (such as checkpointing).
引用
收藏
页码:10762 / 10776
页数:15
相关论文
共 50 条
  • [21] Applying Lagrange Model to Fill Data During Big Data Streaming
    Menon, Sindhu P.
    SUSTAINABLE COMMUNICATION NETWORKS AND APPLICATION, ICSCN 2019, 2020, 39 : 97 - 107
  • [22] BIFM: Big-Data Driven Intelligent Forecasting Model for COVID-19
    Dash, Sujata
    Chakraborty, Chinmay
    Giri, Sourav Kumar
    Pani, Subhendu Kumar
    Frnda, Jaroslav
    IEEE ACCESS, 2021, 9 : 97505 - 97517
  • [23] Development of classification model of power system fault by using PMU big-data
    Kang S.-B.
    Ko B.-K.
    Nam S.-C.
    Choi Y.-D.
    Kim Y.-H.
    Jeon D.-H.
    Transactions of the Korean Institute of Electrical Engineers, 2019, 68 (09): : 1079 - 1084
  • [24] A Big-Data Informed Model Approach to Hearing Health Policy Decision Making
    Katrakazas, Panagiotis
    Manta, Ourania
    Koutsouris, Dimitrios
    2018 14TH INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGY & INTERNET BASED SYSTEMS (SITIS), 2018, : 725 - 729
  • [25] Fast Data-Centric Optimization of Nonlinear Dynamic Flows on Network System Suited for Big-Data and Extreme Computing
    Sakurai, Wataru
    Ichimura, Tsuyoshi
    Fujita, Kohei
    Wijerathne, Lalith
    Hori, Muneo
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (02) : 186 - 191
  • [26] An energy-aware adaptation model for Big Data platforms
    Casalicchio, Emiliano
    Lundberg, Lars
    Shirinbad, Sogand
    2016 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING (ICAC), 2016, : 349 - 350
  • [27] Interplay of Aging and Practice in Conflict Processing: A Big-Data Diffusion-Model Analysis
    Kelber, Paul
    Mittelstaedt, Victor
    Ulrich, Rolf
    PSYCHOLOGY AND AGING, 2025, 40 (01) : 66 - 85
  • [28] A Qualitative Readiness-Requirements Assessment Model for Enterprise Big-Data Infrastructure Investment
    Olama, Mohammed M.
    McNair, Allen W.
    Sukumar, Sreenivas R.
    Nutaro, James J.
    NEXT-GENERATION ANALYST II, 2014, 9122
  • [29] Three Hierarchical Levels of Big-Data Market Model Over Multiple Data Sources for Internet of Things
    Jang, Busik
    Park, Sangdon
    Lee, Joohyung
    Hahn, Sang-Geun
    IEEE ACCESS, 2018, 6 : 31269 - 31280
  • [30] Big-Data Measurement-Model Research about Judges' Actual Workload in China
    Yang, Li
    Yi, Junlin
    Peng, Hui
    ASIAN JOURNAL OF LAW AND SOCIETY, 2020, 7 (03) : 541 - 560