Scalable Distributed Stream Join Processing

被引:50
|
作者
Lin, Qian [1 ]
Ooi, Beng Chin [1 ]
Wang, Zhengkui [1 ]
Yu, Cui [2 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[2] Monmouth Univ, Dept Comp Sci & Software Engn, West Long Branch, NJ USA
基金
新加坡国家研究基金会;
关键词
QUERIES;
D O I
10.1145/2723372.2746485
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficient and scalable stream joins play an important role in performing real-time analytics for many cloud applications. However, like in conventional database processing, online theta-joins over data streams are computationally expensive and moreover, being memory-based processing, they impose high memory requirement on the system. In this paper, we propose a novel stream join model, called join-biclique, which organizes a large cluster as a complete bipartite graph. Join-biclique has several strengths over state-of-the-art techniques, including memory-efficiency, elasticity and scalability. These features are essential for building efficient and scalable streaming systems. Based on join-biclique, we develop a scalable distributed stream join system, BiStream, over a large-scale commodity cluster. Specifically, BiStream is designed to support efficient full-history joins, window-based joins and online data aggregation. BiStream also supports adaptive resource management to dynamically scale out and down the system according to its application workloads. We provide both theoretical cost analysis and extensive experimental evaluations to evaluate the efficiency, elasticity and scalability of BiStream.
引用
收藏
页码:811 / 825
页数:15
相关论文
共 50 条
  • [21] A Scalable Distributed Private Stream Search System
    Zhang, Peng
    Li, Yan
    Liu, Qingyun
    Lin, Hailun
    2015 IEEE 35TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW), 2015, : 128 - 135
  • [22] Robust Distributed Stream Processing
    Lei, Chuan
    Rundensteiner, Elke A.
    Guttman, Joshua D.
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 817 - 828
  • [23] Distributed Stream Processing with DUP
    Bader, Kai Christian
    Eissler, Tilo
    Evans, Nathan
    GauthierDickey, Chris
    Grothoff, Christian
    Grothoff, Krista
    Keene, Jeff
    Meier, Harald
    Ritzdorf, Craig
    Rutherford, Matthew J.
    NETWORK AND PARALLEL COMPUTING, 2010, 6289 : 232 - +
  • [24] Efficient Sliding Window Join in Data Stream Processing
    Kim, Hyeon Gyu
    ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING: FUTURE INFORMATION TECHNOLOGY, VOL 2, 2016, 354 : 375 - 381
  • [25] Efficient Stream Join Processing: Novel Approaches and Challenges
    Aslam, Adeel
    Simonini, Giovanni
    PROCEEDINGS OF THE 33RD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2024, 2024,
  • [26] Reliable stream data processing for elastic distributed stream processing systems
    Xiaohui Wei
    Yuan Zhuang
    Hongliang Li
    Zhiliang Liu
    Cluster Computing, 2020, 23 : 555 - 574
  • [27] Reliable stream data processing for elastic distributed stream processing systems
    Wei, Xiaohui
    Zhuang, Yuan
    Li, Hongliang
    Liu, Zhiliang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (02): : 555 - 574
  • [28] Algorithms for sliding window join over distributed data stream
    Liu, Xuejun
    Qian, Jiangbo
    Jisuanji Gongcheng/Computer Engineering, 2006, 32 (21): : 41 - 43
  • [29] FastJoin: A Skewness-Aware Distributed Stream Join System
    Zhou, Shunjie
    Zhang, Fan
    Chen, Hanhua
    Jin, Hai
    Zhou, Bing Bing
    2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019), 2019, : 1042 - 1052
  • [30] An adaptive join strategy in distributed data stream management system
    Li, Xiaojing
    Gu, Yu
    Yue, Dejun
    Yu, Ge
    CIS: 2007 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PROCEEDINGS, 2007, : 271 - +