Distributed stream join query processing with semijoins

被引:6
|
作者
Tran, Tri Minh [1 ]
Lee, Byung Suk [1 ]
机构
[1] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
基金
美国国家科学基金会;
关键词
Distributed data streams; Join queries; Semijoins;
D O I
10.1007/s10619-010-7062-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering. Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively.
引用
收藏
页码:211 / 254
页数:44
相关论文
共 50 条
  • [41] Distributed query processing on the grid
    Smith, J
    Gounaris, A
    Watson, P
    Paton, NW
    Fernandes, AAA
    Sakellariou, R
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2003, 17 (04): : 353 - 367
  • [42] Optimizing distributed Query Processing
    Roosta, SH
    PDPTA '05: PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-3, 2005, : 869 - 875
  • [43] DISTRIBUTED QUERY-PROCESSING
    YU, CT
    CHANG, CC
    COMPUTING SURVEYS, 1984, 16 (04) : 399 - 433
  • [44] Efficient Distributed Query Processing
    Kolcun, Roman
    Boyle, David E.
    McCann, Julie A.
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2016, 13 (03) : 1230 - 1246
  • [45] Parallel "GroupBy-Before-Join" query processing for high performance parallel/distributed database systems
    Taniar, David
    Rahayu, Wenny
    20TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1, PROCEEDINGS, 2006, : 693 - +
  • [46] Mobile agent based self-adaptive join for wide-area distributed query processing
    Arcangeli, JP
    Hameurlain, A
    Migeon, E
    Morvan, F
    JOURNAL OF DATABASE MANAGEMENT, 2004, 15 (04) : 25 - 44
  • [47] Skyline Join Query Processing over Multiple Relations
    Zhang, Jinchao
    Lin, Zheng
    Li, Bo
    Wang, Weiping
    Meng, Dan
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2016, 2016, 9645 : 353 - 361
  • [48] Adaptive Multi-Join Query Processing in PDBMS
    Wu, Sai
    Vu, Quang Hieu
    Li, Hanzhong
    Tan, Kian-Lee
    ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1239 - +
  • [49] An efficient progressive spatial Join query processing algorithm
    Tang, Gui-Fen
    Yang, Wei-Feng
    Huang, Shuang-Lin
    Li, Wei
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2009, 37 (02): : 318 - 324
  • [50] Fast Distributed Complex Join Processing
    Zhang, Hao
    Qiao, Miao
    Yu, Jeffrey Xu
    Cheng, Hong
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2087 - 2092