Vertical Partitioning for Query Processing over Raw Data

被引:7
|
作者
Zhao, Weijie [1 ]
Cheng, Yu [1 ]
Rusu, Florin [1 ]
机构
[1] UC Merced, Merced, CA 95340 USA
关键词
DESIGN;
D O I
10.1145/2791347.2791369
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional databases are not equipped with the adequate functionality to handle the volume and variety of "Big Data". Strict schema definition and data loading are prerequisites even for the most primitive query session. Raw data processing has been proposed as a schema-on-demand alternative that provides instant access to the data. When loading is an option, it is driven exclusively by the current-running query, resulting in sub-optimal performance across a query workload. In this paper, we investigate the problem of workload-driven raw data processing with partial loading. We model loading as fully-replicated binary vertical partitioning. We provide a linear mixed integer programming optimization formulation that we prove to be NP-hard. We design a two-stage heuristic that comes within close range of the optimal solution in a fraction of the time. We extend the optimization formulation and the heuristic to pipelined raw data processing, scenario in which data access and extraction are executed concurrently. We provide three case-studies over real data formats that confirm the accuracy of the model when implemented in a state-of-the-art pipelined operator for raw data processing.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Adaptive Query Processing on RAW Data
    Karpathiotakis, Manos
    Branco, Miguel
    Alagiannis, Ioannis
    Ailamaki, Anastasia
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (12): : 1119 - 1130
  • [2] DIAERESIS: RDF data partitioning and query processing on SPARK
    Troullinou, Georgia
    Agathangelos, Giannis
    Kondylakis, Haridimos
    Stefanidis, Kostas
    Plexousakis, Dimitris
    SEMANTIC WEB, 2024, 15 (05) : 1763 - 1789
  • [3] NoDB in Action: Adaptive Query Processing on Raw Data
    Alagiannis, Loannis
    Borovica, Renata
    Branco, Miguel
    Idreost, Stratos
    Ailamaki, Anastasia
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 1942 - 1945
  • [4] Optimization of query processing through constrained vertical partitioning of relational tables
    Liu, Zhenjie
    Getta, Janusz R.
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON DATABASES AND APPLICATIONS, 2006, : 221 - +
  • [5] An evaluation of vertical class partitioning for query processing in object-oriented databases
    Fung, CW
    Karlapalem, K
    Li, Q
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (05) : 1095 - 1118
  • [6] Query Execution for RDF Data using Structure Indexed Vertical Partitioning
    Shah, Bhavik
    Padiya, Trupti
    Bhise, Minal
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 575 - 584
  • [7] Jigsaw: A Data Storage and Query Processing Engine for Irregular Table Partitioning
    Kang, Donghe
    Jiang, Ruochen
    Blanas, Spyros
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 898 - 911
  • [8] Efficient query processing on relational data-partitioning index structures
    Kriegel, HP
    Kunath, P
    Pfeifle, M
    Renz, M
    16TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2004, : 119 - 122
  • [9] Query processing over object views of relational data
    Fahl G.
    Risch T.
    The VLDB Journal, 1997, 6 (4) : 261 - 281
  • [10] Adaptive processing for continuous query over data stream
    Bae, Misook
    Hwang, Buhyun
    Nam, Jiseung
    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2007, 4742 : 347 - 358