From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System

被引:58
|
作者
Chu, Shumo [1 ]
Balazinska, Magdalena [1 ]
Suciu, Dan [1 ]
机构
[1] Univ Washington, Comp Sci & Engn, Seattle, WA 98195 USA
关键词
D O I
10.1145/2723372.2750545
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big data analytics often requires processing complex queries using massive parallelism, where the main performance metrics is the communication cost incurred during data reshuffling. In this paper, we describe a system that can compute efficiently complex join queries, including queries with cyclic joins, on a massively parallel architecture. We build on two independent lines of work for multi-join query evaluation: a communication-optimal algorithm for distributed evaluation, and a worst-case optimal algorithm for sequential evaluation. We evaluate these algorithms together, then describe novel, practical optimizations for both algorithms.
引用
收藏
页码:63 / 78
页数:16
相关论文
共 50 条
  • [1] Thorough Data Pruning for Join Query in Database System
    Gao, Jintao
    Li, Zhanhuai
    Sun, Jian
    IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2024, 9 (03): : 409 - 421
  • [2] Performance evaluation of parallel GroupBy-Before-Join query processing in high performance database systems
    Taniar, D
    Rahayu, JW
    Ekonomosa, H
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, 2001, 2110 : 241 - 250
  • [3] Parallel "GroupBy-Before-Join" query processing for high performance parallel/distributed database systems
    Taniar, David
    Rahayu, Wenny
    20TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1, PROCEEDINGS, 2006, : 693 - +
  • [4] Efficient SPARQL Query Evaluation In a Database Cluster
    Du, Fang
    Bian, Haoqiong
    Chen, Yueguo
    Du, Xiaoyong
    2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 165 - 172
  • [5] Performance analysis of "Groupby-After-Join" query processing in parallel database systems
    Taniar, D
    Tan, RBN
    Leung, CHC
    Liu, KH
    INFORMATION SCIENCES, 2004, 168 (1-4) : 25 - 50
  • [6] Efficient parallel spatial join processing method in a shared-nothing database cluster system
    Chung, W
    Park, SY
    Bae, HY
    EMBEDDED SOFTWARE AND SYSTEMS, 2005, 3605 : 81 - 87
  • [7] Parallel Star Join plus DataIndexes: Efficient query processing in data warehouses and OLAP
    Datta, A
    VanderMeer, D
    Ramamritham, K
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (06) : 1299 - 1316
  • [8] Making Join Views Updatable on Relational Database Systems in Theory and in Practice
    Masunaga, Yoshifumi
    Nagata, Yugo
    Ishii, Tatsuo
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM) 2019, 2019, 935 : 823 - 840
  • [9] Efficient use of parallel & distributed systems: From theory to practice
    Monien, B
    Diekmann, R
    Feldmann, R
    Klasing, R
    Luling, R
    Menzel, K
    Romke, T
    Schroeder, UP
    COMPUTER SCIENCE TODAY: RECENT TRENDS AND DEVELOPMENTS, 1995, 1000 : 62 - 77
  • [10] A Selective Checkpointing Mechanism for Query Plans in a Parallel Database System
    Chen, Ting
    Taura, Kenjiro
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,