Continuous Cloud-Scale Query Optimization and Processing

被引:27
|
作者
Bruno, Nicolas [1 ]
Jain, Sapna [2 ]
Zhou, Jingren [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98008 USA
[2] Indian Inst Technol, Bombay, Maharashtra, India
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2013年 / 6卷 / 11期
关键词
D O I
10.14778/2536222.2536223
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Massive data analysis in cloud-scale data centers plays a crucial role in making critical business decisions. Highlevel scripting languages free developers from understanding various system trade-offs, but introduce new challenges for query optimization. One key optimization challenge is missing accurate data statistics, typically due to massive data volumes and their distributed nature, complex computation logic, and frequent usage of user-defined functions. In this paper we propose novel techniques to adapt query processing in the Scope system, the cloud-scale computation environment in Microsoft Online Services. We continuously monitor query execution, collect actual runtime statistics, and adapt parallel execution plans as the query executes. We discuss similarities and differences between our approach and alternatives proposed in the context of traditional centralized systems. Experiments on large-scale Scope production clusters show that the proposed techniques systematically solve the challenge of missing/inaccurate data statistics, detect and resolve partition skew and plan structure, and improve query latency by a few folds for real workloads. Although we focus on optimizing high-level languages, the same ideas are also applicable for MapReduce systems.
引用
收藏
页码:961 / 972
页数:12
相关论文
共 50 条
  • [1] Cloud-Scale Transaction Processing with ParaDB System: A Demonstration
    Guo, Xiaoyan
    Cao, Yu
    Zhou, Baoyao
    Xiang, Dong
    Zhao, Liyuan
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT II, 2014, 8422 : 535 - 538
  • [2] CLOUD-SCALE UNCERTAINTIES
    Beeferman, Leah
    PUBLIC-ART CULTURE IDEAS, 2024, 35 (70):
  • [3] A Cloud-Scale Acceleration Architecture
    Caulfield, Adrian M.
    Chung, Eric S.
    Putnam, Andrew
    Angepat, Hari
    Fowers, Jeremy
    Haselman, Michael
    Heil, Stephen
    Humphrey, Matt
    Kaur, Puneet
    Kim, Joo-Young
    Lo, Daniel
    Massengill, Todd
    Ovtcharov, Kalin
    Papamichael, Michael
    Woods, Lisa
    Lanka, Sitaram
    Chiou, Derek
    Burger, Doug
    2016 49TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2016,
  • [4] SEARCH FOR CLOUD-SCALE COVARIATES
    ACKERMAN, B
    BULLETIN OF THE AMERICAN METEOROLOGICAL SOCIETY, 1979, 60 (05) : 567 - 567
  • [5] The continuous melting process in a cloud-scale model using a bin microphysics scheme
    Planche, Celine
    Wobrock, Wolfram
    Flossmann, Andrea I.
    QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, 2014, 140 (683) : 1986 - 1996
  • [6] A Data Generator for Cloud-Scale Benchmarking
    Rabl, Tilmann
    Frank, Michael
    Sergieh, Hatem Mousselly
    Kosch, Harald
    PERFORMANCE EVALUATION, MEASUREMENT AND CHARACTERIZATION OF COMPLEX SYSTEMS, 2011, 6417 : 41 - 56
  • [7] SCARF: A container-based approach to cloud-scale digital forensic processing
    Stelly, Christopher
    Roussev, Vassil
    DIGITAL INVESTIGATION, 2017, 22 : S39 - S47
  • [8] An efficient query processing optimization based on ELM in the cloud
    Linlin Ding
    Junchang Xin
    Guoren Wang
    Neural Computing and Applications, 2016, 27 : 35 - 44
  • [9] Architecting a Cloud-Scale Identity Fabric
    Olden, Eric
    COMPUTER, 2011, 44 (03) : 52 - 59
  • [10] Secure query processing and optimization in cloud environment: a review
    Divya, V. L.
    Job, P. A.
    Preetha, Mathew K.
    INFORMATION SECURITY JOURNAL, 2024, 33 (02): : 172 - 191