Parallel query processing in a polystore

被引:0
|
作者
Pavlos Kranas
Boyan Kolev
Oleksandra Levchenko
Esther Pacitti
Patrick Valduriez
Ricardo Jiménez-Peris
Marta Patiño-Martinez
机构
[1] LeanXcale,
[2] Distributed Systems Lab at Universidad Politécnica de Madrid,undefined
[3] Inria,undefined
[4] University of Montpellier,undefined
[5] CNRS,undefined
[6] LIRMM,undefined
来源
关键词
Database integration; Heterogeneous databases; Distributed and parallel databases; Polystores; Query languages; Query processing;
D O I
暂无
中图分类号
学科分类号
摘要
The blooming of different data stores has made polystores a major topic in the cloud and big data landscape. As the amount of data grows rapidly, it becomes critical to exploit the inherent parallel processing capabilities of underlying data stores and data processing platforms. To fully achieve this, a polystore should: (i) preserve the expressivity of each data store’s native query or scripting language and (ii) leverage a distributed architecture to enable parallel data integration, i.e. joins, on top of parallel retrieval of underlying partitioned datasets. In this paper, we address these points by: (i) using the polyglot approach of the CloudMdsQL query language that allows native queries to be expressed as inline scripts and combined with SQL statements for ad-hoc integration and (ii) incorporating the approach within the LeanXcale distributed query engine, thus allowing for native scripts to be processed in parallel at data store shards. In addition, (iii) efficient optimization techniques, such as bind join, can take place to improve the performance of selective joins. We evaluate the performance benefits of exploiting parallelism in combination with high expressivity and optimization through our experimental validation.
引用
收藏
页码:939 / 977
页数:38
相关论文
共 50 条
  • [31] Hierarchical architecture for parallel query processing on networks of workstations
    Xie, BQ
    Dandamudi, SP
    FIFTH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 1998, : 351 - 358
  • [32] Parallel algorithms for selection query processing involving index in parallel database systems
    Taniar, D
    Rahayu, JW
    Tan, RBN
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2004, 19 (02): : 95 - 114
  • [33] Leveraging computation sharing and parallel processing in location-dependent query processing
    Cazalas, Jonathan
    Guha, Ratan
    JOURNAL OF SUPERCOMPUTING, 2012, 61 (01): : 215 - 234
  • [34] Leveraging computation sharing and parallel processing in location-dependent query processing
    Jonathan Cazalas
    Ratan Guha
    The Journal of Supercomputing, 2012, 61 : 215 - 234
  • [35] Integrating Real-Time and Batch Processing in a Polystore
    Meehan, John
    Zdonik, Stan
    Tian, Shaobo
    Tian, Yulong
    Tatbul, Nesime
    Dziedzic, Adam
    Elmore, Aaron
    2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
  • [36] Parallel OLAP query processing in database clusters with data replication
    Lima, Alexandre A. B.
    Furtado, Camille
    Valduriez, Patrick
    Mattoso, Marta
    DISTRIBUTED AND PARALLEL DATABASES, 2009, 25 (1-2) : 97 - 123
  • [37] LShape Partitioning: Parallel Skyline Query Processing Using MapReduce
    Wijayanto, Heri
    Wang, Wenlu
    Ku, Wei-Shinn
    Chen, Arbee L. P.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (07) : 3363 - 3376
  • [38] Extensible parallel query processing for exploratory geoscientific data mining
    Shek, EC
    Muntz, RR
    Mesrobian, E
    DATA MINING AND KNOWLEDGE DISCOVERY, 2001, 5 (04) : 277 - 304
  • [39] OLAP parallel query processing in clouds with C-ParGRES
    Ribeiro, Marcello W. M.
    Lima, Alexandre A. B.
    de Oliveira, Daniel
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (07):
  • [40] Scalable XML Query Processing using Parallel Pushdown Transducers
    Ogden, Peter
    Thomas, David
    Pietzuch, Peter
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (14): : 1738 - 1749