Scalable Distributed Stream Join Processing

被引:50
|
作者
Lin, Qian [1 ]
Ooi, Beng Chin [1 ]
Wang, Zhengkui [1 ]
Yu, Cui [2 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[2] Monmouth Univ, Dept Comp Sci & Software Engn, West Long Branch, NJ USA
基金
新加坡国家研究基金会;
关键词
QUERIES;
D O I
10.1145/2723372.2746485
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficient and scalable stream joins play an important role in performing real-time analytics for many cloud applications. However, like in conventional database processing, online theta-joins over data streams are computationally expensive and moreover, being memory-based processing, they impose high memory requirement on the system. In this paper, we propose a novel stream join model, called join-biclique, which organizes a large cluster as a complete bipartite graph. Join-biclique has several strengths over state-of-the-art techniques, including memory-efficiency, elasticity and scalability. These features are essential for building efficient and scalable streaming systems. Based on join-biclique, we develop a scalable distributed stream join system, BiStream, over a large-scale commodity cluster. Specifically, BiStream is designed to support efficient full-history joins, window-based joins and online data aggregation. BiStream also supports adaptive resource management to dynamically scale out and down the system according to its application workloads. We provide both theoretical cost analysis and extensive experimental evaluations to evaluate the efficiency, elasticity and scalability of BiStream.
引用
收藏
页码:811 / 825
页数:15
相关论文
共 50 条
  • [41] Bounding substreams in distributed stream processing
    Trofimov, Artem
    Sokolov, Nikita
    Marshalkin, Nikita
    Kuralenok, Igor
    Novikov, Boris
    INFORMATION SYSTEMS, 2023, 117
  • [42] Smart Distributed DataSets for Stream Processing
    Lopes, Tiago
    Coimbra, Miguel
    Veiga, Luis
    EURO-PAR 2021: PARALLEL PROCESSING, 2021, 12820 : 249 - 265
  • [43] Task Allocation for Distributed Stream Processing
    Eidenbenz, Raphael
    Locher, Thomas
    IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,
  • [44] Elastic Stream Processing for Distributed Environments
    Hochreiner, Christoph
    Schulte, Stefan
    Dustdar, Schahram
    Lecue, Freddy
    IEEE INTERNET COMPUTING, 2015, 19 (06) : 54 - 59
  • [45] Distributed Data Stream Processing with Onix
    Shtykh, Roman Y.
    Suzuki, Toshihiro
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 267 - 268
  • [46] Load distribution for distributed stream processing
    Xing, Ying
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2004, 3268 : 112 - 120
  • [47] A parallel spatial join processing for distributed spatial databases
    Kang, MS
    Ko, SK
    Koh, K
    Choy, YC
    FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS, 2002, 2522 : 212 - 225
  • [48] Adaptive Algorithms for Join Processing in Distributed Database Systems
    Peter Scheuermann
    Eugene Inseok Chong
    Distributed and Parallel Databases, 1997, 5 : 233 - 269
  • [49] Skyline-join query processing in distributed databases
    Mei BAI
    Junchang XIN
    Guoren WANG
    Roger ZIMMERMANN
    Xite WANG
    Frontiers of Computer Science, 2016, 10 (02) : 330 - 352
  • [50] Adaptive algorithms for join processing in distributed database systems
    Scheuermann, P
    Chong, EI
    DISTRIBUTED AND PARALLEL DATABASES, 1997, 5 (03) : 233 - 269