Scalable Distributed Stream Join Processing

被引:50
|
作者
Lin, Qian [1 ]
Ooi, Beng Chin [1 ]
Wang, Zhengkui [1 ]
Yu, Cui [2 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[2] Monmouth Univ, Dept Comp Sci & Software Engn, West Long Branch, NJ USA
基金
新加坡国家研究基金会;
关键词
QUERIES;
D O I
10.1145/2723372.2746485
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficient and scalable stream joins play an important role in performing real-time analytics for many cloud applications. However, like in conventional database processing, online theta-joins over data streams are computationally expensive and moreover, being memory-based processing, they impose high memory requirement on the system. In this paper, we propose a novel stream join model, called join-biclique, which organizes a large cluster as a complete bipartite graph. Join-biclique has several strengths over state-of-the-art techniques, including memory-efficiency, elasticity and scalability. These features are essential for building efficient and scalable streaming systems. Based on join-biclique, we develop a scalable distributed stream join system, BiStream, over a large-scale commodity cluster. Specifically, BiStream is designed to support efficient full-history joins, window-based joins and online data aggregation. BiStream also supports adaptive resource management to dynamically scale out and down the system according to its application workloads. We provide both theoretical cost analysis and extensive experimental evaluations to evaluate the efficiency, elasticity and scalability of BiStream.
引用
收藏
页码:811 / 825
页数:15
相关论文
共 50 条
  • [1] Distributed Adaptive Windowed Stream Join Processing
    Tri Minh Tran
    Lee, Byung Suk
    INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES, 2011, 2 (02) : 59 - 81
  • [2] Distributed stream join query processing with semijoins
    Tri Minh Tran
    Byung Suk Lee
    Distributed and Parallel Databases, 2010, 27 : 211 - 254
  • [3] Distributed stream join query processing with semijoins
    Tran, Tri Minh
    Lee, Byung Suk
    DISTRIBUTED AND PARALLEL DATABASES, 2010, 27 (03) : 211 - 254
  • [4] Simois: A Scalable Distributed Stream Join System with Skewed Workloads
    Zhang, Fan
    Chen, Hanhua
    Jin, Hai
    2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 176 - 185
  • [5] Stream-aware indexing for distributed inequality join processing
    Aslam, Adeel
    Simonini, Giovanni
    Gagliardelli, Luca
    Zecchini, Luca
    Bergamaschi, Sonia
    INFORMATION SYSTEMS, 2024, 125
  • [6] Semi-Stream Similarity Join Processing in a Distributed Environment
    Kim, Hong-Ji
    Lee, Ki-Hoon
    IEEE ACCESS, 2020, 8 : 130194 - 130204
  • [7] Distributed Stream KNN Join
    Shahvarani, Amirhesam
    Jacobsen, Hans-Arno
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1597 - 1609
  • [8] SPSPS: A scalable parallel-distributed stream processing system
    Institute of Information Engineering, Chinese Academy of Sciences, Beijing
    100093, China
    不详
    100093, China
    不详
    100029, China
    Tien Tzu Hsueh Pao, 4 (639-646):
  • [9] Distributed and scalable sequential pattern mining through stream processing
    Chun-Chieh Chen
    Hong-Han Shuai
    Ming-Syan Chen
    Knowledge and Information Systems, 2017, 53 : 365 - 390
  • [10] Distributed and scalable sequential pattern mining through stream processing
    Chen, Chun-Chieh
    Shuai, Hong-Han
    Chen, Ming-Syan
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 53 (02) : 365 - 390