Distributed stream join query processing with semijoins

被引:6
|
作者
Tran, Tri Minh [1 ]
Lee, Byung Suk [1 ]
机构
[1] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
基金
美国国家科学基金会;
关键词
Distributed data streams; Join queries; Semijoins;
D O I
10.1007/s10619-010-7062-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering. Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively.
引用
收藏
页码:211 / 254
页数:44
相关论文
共 50 条
  • [21] Efficient distance join query processing in distributed spatial data management systems
    Garcia-Garcia, Francisco
    Corral, Antonio
    Iribarne, Luis
    Vassilakopoulos, Michael
    Manolopoulos, Yannis
    INFORMATION SCIENCES, 2020, 512 : 985 - 1008
  • [22] Storing Join Relationships for Fast Join Query Processing
    Hamdi, Mohammed
    Yu, Feng
    Alswedani, Sarah
    Hou, Wen-Chi
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2017, PT I, 2017, 10438 : 167 - 177
  • [23] Parallel spatial join query processing
    Liu, Yu
    Sun, Li
    Tian, Yong-Qing
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2002, 36 (04): : 512 - 515
  • [24] Query-Centric Failure Recovery for Distributed Stream Processing Engines
    Su, Li
    Zhou, Yongluan
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1276 - 1279
  • [25] Adaptive SQL Query Optimization in Distributed Stream Processing: A Preliminary Study
    Sharkova, Darya
    Chernokoz, Alexander
    Trofimov, Artem
    Sokolov, Nikita
    Gorshkova, Ekaterina
    Kuralenok, Igor
    Novikov, Boris
    SOFTWARE FOUNDATIONS FOR DATA INTEROPERABILITY, SFDI 2021, 2022, 1457 : 96 - 109
  • [26] Leveraging distributed Publish/Subscribe systems for scalable stream query processing
    Zhou, Yongluan
    Tan, Kian-Lee
    Yu, Feng
    BUSINESS INTELLIGENCE FOR THE REAL-TIME ENTERPRISES, 2007, 4365 : 20 - +
  • [27] Lightweight Distributed Execution Engine for Large-Scale Spatial Join Query Processing
    Zhang, Jianting
    You, Simin
    Gruenwald, Le
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 150 - 157
  • [28] JQPro:Join Query Processing in a Distributed System for Big RDF Data Using the Hash-Merge Join Technique
    Elzein, Nahla Mohammed
    Majid, Mazlina Abdul
    Hashem, Ibrahim Abaker Targio
    Ibrahim, Ashraf Osman
    Abulfaraj, Anas W.
    Binzagr, Faisal
    MATHEMATICS, 2023, 11 (05)
  • [29] Data stream query processing
    Koudas, N
    Srivastava, D
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 1145 - 1145
  • [30] Data stream query processing
    Koudas, N
    Srivastava, D
    FOURTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2003, : 374 - 374