Vertical Partitioning for Query Processing over Raw Data

被引:7
|
作者
Zhao, Weijie [1 ]
Cheng, Yu [1 ]
Rusu, Florin [1 ]
机构
[1] UC Merced, Merced, CA 95340 USA
关键词
DESIGN;
D O I
10.1145/2791347.2791369
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional databases are not equipped with the adequate functionality to handle the volume and variety of "Big Data". Strict schema definition and data loading are prerequisites even for the most primitive query session. Raw data processing has been proposed as a schema-on-demand alternative that provides instant access to the data. When loading is an option, it is driven exclusively by the current-running query, resulting in sub-optimal performance across a query workload. In this paper, we investigate the problem of workload-driven raw data processing with partial loading. We model loading as fully-replicated binary vertical partitioning. We provide a linear mixed integer programming optimization formulation that we prove to be NP-hard. We design a two-stage heuristic that comes within close range of the optimal solution in a fraction of the time. We extend the optimization formulation and the heuristic to pipelined raw data processing, scenario in which data access and extraction are executed concurrently. We provide three case-studies over real data formats that confirm the accuracy of the model when implemented in a state-of-the-art pipelined operator for raw data processing.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] RDF partitioning for scalable SPARQL query processing
    Xiaoyan Wang
    Tao Yang
    Jinchuan Chen
    Long He
    Xiaoyong Du
    Frontiers of Computer Science, 2015, 9 : 919 - 933
  • [22] A Query Service for Raw Sensor Data
    McCann, Donall
    Roantree, Mark
    SMART SENSING AND CONTEXT, PROCEEDINGS, 2009, 5741 : 38 - 50
  • [23] Query processing over incomplete autonomous databases: query rewriting using learned data dependencies
    Garrett Wolf
    Aravind Kalavagattu
    Hemal Khatri
    Raju Balakrishnan
    Bhaumik Chokshi
    Jianchun Fan
    Yi Chen
    Subbarao Kambhampati
    The VLDB Journal, 2009, 18 : 1167 - 1190
  • [24] Query processing over incomplete autonomous databases: query rewriting using learned data dependencies
    Wolf, Garrett
    Kalavagattu, Aravind
    Khatri, Hemal
    Balakrishnan, Raju
    Chokshi, Bhaumik
    Fan, Jianchun
    Chen, Yi
    Kambhampati, Subbarao
    VLDB JOURNAL, 2009, 18 (05): : 1167 - 1190
  • [25] TiQ: A Timeline query processing system over Road Traffic Data
    Imawan, Ardi
    Putri, Fadhilah
    Kwon, Joonho
    2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY), 2015, : 676 - 682
  • [26] Crowdsourcing for Top-K Query Processing over Uncertain Data
    Ciceri, Eleonora
    Fraternali, Piero
    Martinenghi, Davide
    Tagliasacchi, Marco
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) : 41 - 53
  • [27] Efficient Rank Based KNN Query Processing Over Uncertain Data
    Zhang, Ying
    Lin, Xuemin
    Zhu, Gaoping
    Zhang, Wenjie
    Lin, Qianlu
    26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 28 - 39
  • [28] Adaptive Secure Nearest Neighbor Query Processing Over Encrypted Data
    Li, Rui
    Liu, Alex X.
    Xu, Huanle
    Liu, Ying
    Yuan, Huaqiang
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2022, 19 (01) : 91 - 106
  • [29] A Systematic Literature Review of Skyline Query Processing Over Data Stream
    Mohamud, Mudathir Ahmed
    Ibrahim, Hamidah
    Sidi, Fatimah
    Rum, Siti Nurulain Mohd
    Dzolkhifli, Zarina Binti
    Xiaowei, Zhang
    Lawal, Ma'aruf Mohammed
    IEEE ACCESS, 2023, 11 : 72813 - 72835
  • [30] Crowdsourcing for Top-K Query Processing over Uncertain Data
    Ciceri, Eleonora
    Fraternali, Piero
    Martinenghi, Davide
    Tagliasacchi, Marco
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1452 - 1453