Vertical Partitioning for Query Processing over Raw Data

被引:7
|
作者
Zhao, Weijie [1 ]
Cheng, Yu [1 ]
Rusu, Florin [1 ]
机构
[1] UC Merced, Merced, CA 95340 USA
关键词
DESIGN;
D O I
10.1145/2791347.2791369
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Traditional databases are not equipped with the adequate functionality to handle the volume and variety of "Big Data". Strict schema definition and data loading are prerequisites even for the most primitive query session. Raw data processing has been proposed as a schema-on-demand alternative that provides instant access to the data. When loading is an option, it is driven exclusively by the current-running query, resulting in sub-optimal performance across a query workload. In this paper, we investigate the problem of workload-driven raw data processing with partial loading. We model loading as fully-replicated binary vertical partitioning. We provide a linear mixed integer programming optimization formulation that we prove to be NP-hard. We design a two-stage heuristic that comes within close range of the optimal solution in a fraction of the time. We extend the optimization formulation and the heuristic to pipelined raw data processing, scenario in which data access and extraction are executed concurrently. We provide three case-studies over real data formats that confirm the accuracy of the model when implemented in a state-of-the-art pipelined operator for raw data processing.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data
    Yang, Shuxiang
    Zhang, Wenjie
    Zhang, Ying
    Lin, Xuemin
    ADVANCES IN DATA AND WEB MANAGEMENT, PROCEEDINGS, 2009, 5446 : 51 - +
  • [32] Federated SPARQL Query Processing over Heterogeneous Linked Data Fragments
    Heling, Lars
    Acosta, Maribel
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 1047 - 1057
  • [33] Similarity query processing algorithm over data stream based on LCSS
    Wang, Shaopeng
    Wen, Yingyou
    Zhao, Hong
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (09): : 1976 - 1991
  • [34] Query Processing over Data Warehouse using Relational Databases and NoSQL
    Carniel, Anderson Chaves
    Sa, Aried de Aguiar
    Porto Brisighello, Vinicius Henrique
    Ribeiro, Marcela Xavier
    Bueno, Renato
    Ciferri, Ricardo Rodrigues
    de Aguiar Ciferri, Cristina Dutra
    2012 XXXVIII CONFERENCIA LATINOAMERICANA EN INFORMATICA (CLEI), 2012,
  • [35] Towards Query Processing over Heterogeneous Federations of RDF Data Sources
    Cheng, Sijin
    Hartig, Olaf
    SEMANTIC WEB: ESWC 2022 SATELLITE EVENTS, 2022, 13384 : 57 - 62
  • [36] Continuous Query Processing Over Data, Streams and Services: Application to Robotics
    Scuturici, Vasile-Marian
    Gripay, Yann
    Petit, Jean-Marc
    Deguchi, Yutaka
    Suzuki, Einoshin
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS (ADBIS 2015), 2015, 539 : 36 - 43
  • [37] Query Processing on Large Graphs: Scalability Through Partitioning
    Bodra, Jay
    Das, Soumyava
    Santra, Abhishek
    Chakravarthy, Sharma
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2018), 2018, 11031 : 271 - 288
  • [38] Query Optimization in Encrypted Relational Databases by Vertical Schema Partitioning
    Canim, Mustafa
    Kantarcioglu, Murat
    Inan, Ali
    SECURE DATA MANAGEMENT, PROCEEDINGS, 2009, 5776 : 1 - 16
  • [39] Linked Data Query Processing
    Hartig, Olaf
    Oezsu, M. Tamer
    2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 1286 - 1289
  • [40] Data analysis for query processing
    Robinson, J
    Lowden, BGT
    ADVANCES IN INTELLIGENT DATA ANALYSIS: REASONING ABOUT DATA, 1997, 1280 : 447 - 458