Exactly-Once Semantics with Real-Time Data Pipelines

被引:0
|
作者
Rastogi, Avnish Kumar [1 ]
Malik, Naveen [2 ]
Hooda, Sakshi [3 ]
机构
[1] HCL Technol, Noida, India
[2] Royal Bank Scotland, Noida, India
[3] Surajmal Inst Technol, Delhi, India
来源
AMBIENT COMMUNICATIONS AND COMPUTER SYSTEMS, RACCCS 2017 | 2018年 / 696卷
关键词
Exactly once processing; Spark; Streaming; Distributed; Kafka; Redis; Vertica; Nosql;
D O I
10.1007/978-981-10-7386-1_26
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Real-time systems like IoT, recommendation systems, fraud detection systems often have a need of ensuring that the application processes the data only once. In real-time streaming applications there is often a possibility that a batch of data might be handed over to the application multiple times resulting in duplicate data being processed by the application. Any stream processing product cannot unilaterally guarantee exactly once processing semantics. It is true under certain assumptions or when the application and the stream processing framework collaborate in certain ways. In this paper, we present a design to address the problem of real-time streaming applications by achieving an end-to-end exactly once delivery. The main contribution of our work is to provide solution to the complex task of recovering the application state from application restarts, network crashes, etc., and detecting and filtering out of order duplicate data while maintaining a high throughput.
引用
收藏
页码:293 / 303
页数:11
相关论文
共 50 条
  • [1] Exactly-once semantics in a replicated messaging system
    Huang, YQ
    Garcia-Molina, H
    17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 3 - 12
  • [2] Exactly-Once Quantity Transfer
    Shoker, Ali
    Almeida, Paulo Sergio
    Baquero, Carlos
    2015 IEEE 34th Symposium on Reliable Distributed Systems Workshop (SRDSW), 2015, : 68 - 73
  • [3] Exon: An Oblivious Exactly-Once Messaging Protocol
    Kassam, Ziad
    Almeida, Paulo Sergio
    Shoker, Ali
    2022 31ST INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2022), 2022,
  • [4] ExoFlow: A Universal Workflow System for Exactly-Once DAGs
    Zhuang, Siyuan
    Wang, Stephanie
    Liang, Eric
    Cheng, Yi
    Stoica, Ion
    PROCEEDINGS OF THE 17TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2023, 2023, : 269 - 286
  • [5] Continuous Semantics to Analyze Real-Time Data
    Sheth, Amit
    Thomas, Christopher
    Mehra, Pankaj
    IEEE INTERNET COMPUTING, 2010, 14 (06) : 84 - 89
  • [6] Exactly-once delivery in a content-based publish-subscribe system
    Bhola, S
    Strom, R
    Bagchi, S
    Zhao, YY
    Auerbach, J
    INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2002, : 7 - 16
  • [7] A fault-tolerant protocol for providing the exactly-once property of mobile agents
    Rothermel, K
    Strasser, M
    SEVENTEENTH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 1998, : 100 - 108
  • [8] Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues
    Patel, Dharmit
    Khasib, Faraj
    Sadooghi, Iman
    Raicu, Ioan
    2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2014, : 883 - 892
  • [9] REAL-TIME RADIOGRAPHY OF UNDERWATER PIPELINES
    GROS, XE
    BRITISH JOURNAL OF NON-DESTRUCTIVE TESTING, 1993, 35 (09): : 492 - 495
  • [10] Real-Time Snapshot Maintenance with Incremental ETL Pipelines in Data Warehouses
    Qu, Weiping
    Basavaraj, Vinanthi
    Shankar, Sahana
    Dessloch, Stefan
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, 2015, 9263 : 217 - 228