Parallel query processing in a polystore

被引:0
|
作者
Pavlos Kranas
Boyan Kolev
Oleksandra Levchenko
Esther Pacitti
Patrick Valduriez
Ricardo Jiménez-Peris
Marta Patiño-Martinez
机构
[1] LeanXcale,
[2] Distributed Systems Lab at Universidad Politécnica de Madrid,undefined
[3] Inria,undefined
[4] University of Montpellier,undefined
[5] CNRS,undefined
[6] LIRMM,undefined
来源
关键词
Database integration; Heterogeneous databases; Distributed and parallel databases; Polystores; Query languages; Query processing;
D O I
暂无
中图分类号
学科分类号
摘要
The blooming of different data stores has made polystores a major topic in the cloud and big data landscape. As the amount of data grows rapidly, it becomes critical to exploit the inherent parallel processing capabilities of underlying data stores and data processing platforms. To fully achieve this, a polystore should: (i) preserve the expressivity of each data store’s native query or scripting language and (ii) leverage a distributed architecture to enable parallel data integration, i.e. joins, on top of parallel retrieval of underlying partitioned datasets. In this paper, we address these points by: (i) using the polyglot approach of the CloudMdsQL query language that allows native queries to be expressed as inline scripts and combined with SQL statements for ad-hoc integration and (ii) incorporating the approach within the LeanXcale distributed query engine, thus allowing for native scripts to be processed in parallel at data store shards. In addition, (iii) efficient optimization techniques, such as bind join, can take place to improve the performance of selective joins. We evaluate the performance benefits of exploiting parallelism in combination with high expressivity and optimization through our experimental validation.
引用
收藏
页码:939 / 977
页数:38
相关论文
共 50 条
  • [21] Efficient parallel query processing by graph ranking
    Dereniowski, D
    Kubale, M
    FUNDAMENTA INFORMATICAE, 2006, 69 (03) : 273 - 285
  • [22] Query Languages for Polystore Databases for Large Scientific Data Archives
    Poudel, Manoj
    Shrestha, Shashank
    Sarode, Rashmi P.
    Chu, Wanming
    Bhalla, Suhhash
    2019 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2019), 2019, : 185 - 190
  • [23] Parallel selection query processing involving index in parallel database systems
    Rahayu, JW
    Taniar, D
    I-SPAN'02: INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND NETWORKS, PROCEEDINGS, 2002, : 309 - 314
  • [24] Resource scheduling for parallel query processing on computational grids
    Gounaris, A
    Sakellariou, R
    Paton, NW
    Fernandes, AAA
    FIFTH IEEE/ACM INTERNATIONAL WORKSHOP ON GRID COMPUTING, PROCEEDINGS, 2004, : 396 - 401
  • [25] QScheduler: A Tool for Parallel Query Processing in Database Systems
    Zhang, Qingfeng
    Li, Shanshan
    Xu, Jing
    2014 19TH INTERNATIONAL CONFERENCE ON ENGINEERING OF COMPLEX COMPUTER SYSTEMS (ICECCS 2014), 2014, : 73 - 76
  • [26] Parallel group-by query processing in a cluster architecture
    Taniar, D
    Rahayu, JW
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2002, 17 (01): : 23 - 39
  • [27] Parallel Query Processing: To Separate Communication from Computation
    Zhang, Hao
    Yu, Jeffrey Xu
    Zhang, Yikai
    Zhao, Kangfei
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1447 - 1461
  • [28] Validated cost models for parallel OQL query processing
    Sampaio, SDM
    Paton, NW
    Smith, J
    Watson, P
    OBJECT-ORIENTED INFORMATION SYSTEMS, PROCEEDINGS, 2002, 2425 : 60 - 75
  • [29] Centralized architecture for parallel query processing on networks of workstations
    Zeng, SJ
    Dandamudi, SP
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, PROCEEDINGS, 1999, 1593 : 683 - 692
  • [30] Range sum query processing in parallel data warehouses
    Li, JZ
    Gao, H
    PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT'2003, PROCEEDINGS, 2003, : 877 - 881