Parallel query processing in a polystore

被引:0
|
作者
Pavlos Kranas
Boyan Kolev
Oleksandra Levchenko
Esther Pacitti
Patrick Valduriez
Ricardo Jiménez-Peris
Marta Patiño-Martinez
机构
[1] LeanXcale,
[2] Distributed Systems Lab at Universidad Politécnica de Madrid,undefined
[3] Inria,undefined
[4] University of Montpellier,undefined
[5] CNRS,undefined
[6] LIRMM,undefined
来源
关键词
Database integration; Heterogeneous databases; Distributed and parallel databases; Polystores; Query languages; Query processing;
D O I
暂无
中图分类号
学科分类号
摘要
The blooming of different data stores has made polystores a major topic in the cloud and big data landscape. As the amount of data grows rapidly, it becomes critical to exploit the inherent parallel processing capabilities of underlying data stores and data processing platforms. To fully achieve this, a polystore should: (i) preserve the expressivity of each data store’s native query or scripting language and (ii) leverage a distributed architecture to enable parallel data integration, i.e. joins, on top of parallel retrieval of underlying partitioned datasets. In this paper, we address these points by: (i) using the polyglot approach of the CloudMdsQL query language that allows native queries to be expressed as inline scripts and combined with SQL statements for ad-hoc integration and (ii) incorporating the approach within the LeanXcale distributed query engine, thus allowing for native scripts to be processed in parallel at data store shards. In addition, (iii) efficient optimization techniques, such as bind join, can take place to improve the performance of selective joins. We evaluate the performance benefits of exploiting parallelism in combination with high expressivity and optimization through our experimental validation.
引用
收藏
页码:939 / 977
页数:38
相关论文
共 50 条
  • [1] Parallel query processing in a polystore
    Kranas, Pavlos
    Kolev, Boyan
    Levchenko, Oleksandra
    Pacitti, Esther
    Valduriez, Patrick
    Jimenez-Peris, Ricardo
    Patino-Martinez, Marta
    DISTRIBUTED AND PARALLEL DATABASES, 2021, 39 (04) : 939 - 977
  • [2] Adaptive parallel query processing
    Tok, WH
    Zhao, L
    Bressan, S
    PDPTA'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, 2001, : 590 - 597
  • [3] Skew in Parallel Query Processing
    Beame, Paul
    Koutris, Paraschos
    Suciu, Dan
    PODS'14: PROCEEDINGS OF THE 33RD ACM SIGMOD-SIGACT-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2014, : 212 - 223
  • [4] Parallel spatial join query processing
    Liu, Yu
    Sun, Li
    Tian, Yong-Qing
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2002, 36 (04): : 512 - 515
  • [5] BigDAWG Polystore Query Optimization Through Semantic Equivalences
    She, Zuohao
    Ravishankar, Surabhi
    Duggan, Jennie
    2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
  • [6] Parallel Approach in RDF Query Processing
    Vajgl, Marek
    Parenica, Jan
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NUMERICAL ANALYSIS AND APPLIED MATHEMATICS 2016 (ICNAAM-2016), 2017, 1863
  • [7] Integrating query processing with parallel languages
    Myers, Brandon
    Oskin, Mark
    Howe, Bill
    2015 13TH IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2015, : 240 - 244
  • [8] Communication Steps for Parallel Query Processing
    Tardos, Eva
    JOURNAL OF THE ACM, 2017, 64 (06)
  • [9] SPARQL Query Parallel Processing: A Survey
    Feng, Jiaying
    Meng, Chenhong
    Song, Jiaming
    Zhang, Xiaowang
    Feng, Zhiyong
    Zou, Lei
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 444 - 451
  • [10] Parallel query processing for OLAP in grids
    Kotowski, Nelson
    Lima, Alexandre A. B.
    Pacitti, Esther
    Valduriez, Patrick
    Mattoso, Marta
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2008, 20 (17): : 2039 - 2048