Distributed stream join query processing with semijoins

被引:6
|
作者
Tran, Tri Minh [1 ]
Lee, Byung Suk [1 ]
机构
[1] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
基金
美国国家科学基金会;
关键词
Distributed data streams; Join queries; Semijoins;
D O I
10.1007/s10619-010-7062-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering. Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively.
引用
收藏
页码:211 / 254
页数:44
相关论文
共 50 条
  • [31] Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream
    Chen, Zhida
    Cong, Gao
    Zhang, Zhenjie
    Fu, Tom Z. J.
    Chen, Lisi
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 1095 - 1106
  • [32] Surrounding Join Query Processing in Spatial Databases
    Li, Lingxiao
    Taniar, David
    Indrawan-Santiago, Maria
    Shao, Zhou
    DATABASES THEORY AND APPLICATIONS, ADC 2017, 2017, 10538 : 17 - 28
  • [33] Join Query Processing in Data Quality Management
    Yue, Mingliang
    Gao, Hong
    Shi, Shengfei
    Wang, Hongzhi
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2016, 2016, 9645 : 329 - 342
  • [34] SIMULATION OF JOIN QUERY-PROCESSING ALGORITHMS FOR A TRUSTED DISTRIBUTED DATABASE-MANAGEMENT SYSTEM
    RUBINOVITZ, H
    THURAISINGHAM, B
    INFORMATION AND SOFTWARE TECHNOLOGY, 1993, 35 (05) : 287 - 299
  • [35] A GRAPH-THEORETICAL APPROACH TO DETERMINE A JOIN REDUCER SEQUENCE IN DISTRIBUTED QUERY-PROCESSING
    CHEN, MS
    YU, PS
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1994, 6 (01) : 152 - 165
  • [36] Distributed stream join under workload variance
    Fang, Junhua
    Zhang, Rong
    Wang, Xiaotong
    Zhou, Aoying
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2017, 20 (05): : 1089 - 1110
  • [37] Distributed stream join under workload variance
    Junhua Fang
    Rong Zhang
    Xiaotong Wang
    Aoying Zhou
    World Wide Web, 2017, 20 : 1089 - 1110
  • [38] Query Rewriting in RDF Stream Processing
    Calbimonte, Jean-Paul
    Mora, Jose
    Corcho, Oscar
    SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, 2016, 9678 : 486 - 502
  • [39] Mode Aware Stream Query Processing
    Wei, Mingrui
    Rundensteiner, Elke
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2009, 5566 : 380 - 397
  • [40] Distributed query processing on the grid
    Smith, J
    Gounaris, A
    Watson, P
    Paton, NW
    Fernandes, AAA
    Sakellariou, R
    GRID COMPUTING - GRID 2002, 2002, 2536 : 279 - 290