Scalable Distributed Stream Join Processing

被引:50
|
作者
Lin, Qian [1 ]
Ooi, Beng Chin [1 ]
Wang, Zhengkui [1 ]
Yu, Cui [2 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[2] Monmouth Univ, Dept Comp Sci & Software Engn, West Long Branch, NJ USA
基金
新加坡国家研究基金会;
关键词
QUERIES;
D O I
10.1145/2723372.2746485
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficient and scalable stream joins play an important role in performing real-time analytics for many cloud applications. However, like in conventional database processing, online theta-joins over data streams are computationally expensive and moreover, being memory-based processing, they impose high memory requirement on the system. In this paper, we propose a novel stream join model, called join-biclique, which organizes a large cluster as a complete bipartite graph. Join-biclique has several strengths over state-of-the-art techniques, including memory-efficiency, elasticity and scalability. These features are essential for building efficient and scalable streaming systems. Based on join-biclique, we develop a scalable distributed stream join system, BiStream, over a large-scale commodity cluster. Specifically, BiStream is designed to support efficient full-history joins, window-based joins and online data aggregation. BiStream also supports adaptive resource management to dynamically scale out and down the system according to its application workloads. We provide both theoretical cost analysis and extensive experimental evaluations to evaluate the efficiency, elasticity and scalability of BiStream.
引用
收藏
页码:811 / 825
页数:15
相关论文
共 50 条
  • [31] Optimal component composition for scalable stream processing
    Gu, XH
    Yu, PS
    Nahrstedt, K
    25th IEEE International Conference on Distributed Computing Systems, Proceedings, 2005, : 773 - 782
  • [32] Towards Scalable and Expressive Stream Packet Processing
    Fais, Alessandra
    Lettieri, Giuseppe
    Procissi, Gregorio
    Giordano, Stefano
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [33] Samza: Stateful Scalable Stream Processing at LinkedIn
    Noghabi, Shadi A.
    Paramasivam, Kartik
    Pan, Yi
    Ramesh, Navina
    Bringhurst, Jon
    Gupta, Indranil
    Campbell, Roy H.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (12): : 1634 - 1645
  • [34] KerA: Scalable Data Ingestion for Stream Processing
    Marcu, Ovidiu-Cristian
    Costan, Alexandru
    Antoniu, Gabriel
    Perez-Hernandez, Maria S.
    Nicolae, Bogdan
    Tudoran, Radu
    Bortoli, Stefano
    2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 1480 - 1485
  • [35] Scalable Storage Support for Data Stream Processing
    Sebepou, Zoe
    Magoutis, Kostas
    2010 IEEE 26TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2010,
  • [36] COMBINING JOIN AND SEMI-JOIN OPERATIONS FOR DISTRIBUTED QUERY-PROCESSING
    CHEN, MS
    YU, PS
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1993, 5 (03) : 534 - 542
  • [37] Input-Sensitive Scalable Continuous Join Query Processing
    Agarwal, Pankaj K.
    Xie, Junyi
    Yang, Jun
    Yu, Hai
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2009, 34 (03):
  • [38] A distributed and scalable architecture for packet processing
    Roabtmili, B
    Yazdani, N
    Nourani, M
    APCC 2003: 9TH ASIA-PACIFIC CONFERENCE ON COMMUNICATION, VOLS 1-3, PROCEEDINGS, 2003, : 983 - 987
  • [39] From a Stream of Relational Queries to Distributed Stream Processing
    Zou, Qiong
    Wang, Huayong
    Soule, Robert
    Hirzel, Martin
    Andrade, Henrique
    Gedik, Bugra
    Wu, Kun-Lung
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (02): : 1394 - 1405
  • [40] Load distribution for distributed stream processing
    Xing, Y
    CURRENT TRENDS IN DATABASE TECHNOLOGY - EDBT 2004 WORKSHOPS, PROCEEDINGS, 2004, 3268 : 112 - 120