Continuous Cloud-Scale Query Optimization and Processing

被引:27
|
作者
Bruno, Nicolas [1 ]
Jain, Sapna [2 ]
Zhou, Jingren [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98008 USA
[2] Indian Inst Technol, Bombay, Maharashtra, India
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2013年 / 6卷 / 11期
关键词
D O I
10.14778/2536222.2536223
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Massive data analysis in cloud-scale data centers plays a crucial role in making critical business decisions. Highlevel scripting languages free developers from understanding various system trade-offs, but introduce new challenges for query optimization. One key optimization challenge is missing accurate data statistics, typically due to massive data volumes and their distributed nature, complex computation logic, and frequent usage of user-defined functions. In this paper we propose novel techniques to adapt query processing in the Scope system, the cloud-scale computation environment in Microsoft Online Services. We continuously monitor query execution, collect actual runtime statistics, and adapt parallel execution plans as the query executes. We discuss similarities and differences between our approach and alternatives proposed in the context of traditional centralized systems. Experiments on large-scale Scope production clusters show that the proposed techniques systematically solve the challenge of missing/inaccurate data statistics, detect and resolve partition skew and plan structure, and improve query latency by a few folds for real workloads. Although we focus on optimizing high-level languages, the same ideas are also applicable for MapReduce systems.
引用
收藏
页码:961 / 972
页数:12
相关论文
共 50 条
  • [21] Large-Scale Spatial Join Query Processing in Cloud
    You, Simin
    Zhang, Jianting
    Gruenwald, Le
    2015 13TH IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2015, : 34 - 41
  • [22] Cloud-Scale Application Performance Monitoring with SDN and NFV
    Liu, Guyue
    Wood, Timothy
    2015 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2015), 2015, : 440 - 445
  • [23] NIMBUS: Cloud-scale Attack Detection and Mitigation
    Miao, Rui
    Yu, Minlan
    Jain, Navendu
    SIGCOMM'14: PROCEEDINGS OF THE 2014 ACM CONFERENCE ON SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2014, : 121 - 122
  • [24] Cloud-Scale Genomic Signals Processing for Robust Large-Scale Cancer Genomic Microarray Data Analysis
    Harvey, Benjamin Simeon
    Ji, Soo-Yeon
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2017, 21 (01) : 238 - 245
  • [25] ON THE USE OF MESOSCALE AND CLOUD-SCALE MODELS IN OPERATIONAL FORECASTING
    BROOKS, HE
    DOSWELL, CA
    MADDOX, RA
    WEATHER AND FORECASTING, 1992, 7 (01) : 120 - 132
  • [26] Automated Intelligent Healing in Cloud-Scale Data Centers
    Li, Rui
    Cheng, Zhinan
    Lee, Patrick P. C.
    Wang, Pinghui
    Qiang, Yi
    Lan, Lin
    He, Cheng
    Lu, Jinlong
    Wang, Mian
    Ding, Xinquan
    2021 40TH INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS 2021), 2021, : 244 - 253
  • [27] MOLECULAR CLOUD-SCALE STAR FORMATION IN NGC 300
    Faesi, Christopher M.
    Lada, Charles J.
    Forbrich, Jan
    Menten, Karl M.
    Bouy, Herve
    ASTROPHYSICAL JOURNAL, 2014, 789 (01):
  • [28] xMeta: SSD-HDD-hybrid Optimization for Metadata Maintenance of Cloud-scale Object Storage
    Chen, Yan
    Ke, Qiwen
    Li, Huiba
    Wu, Yongwei
    Zhang, Yiming
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (02)
  • [29] Xerxes: Distributed Load Generator for Cloud-scale Experimentation
    Kesavan, Mukil
    Gavrilovska, Ada
    Schwan, Karsten
    PROCEEDINGS OF THE 2012 SEVENTH OPEN CIRRUS SUMMIT (OCS 2012), 2012, : 20 - 24
  • [30] New Systems Opportunities in Cloud-Scale Data Center
    Chiueh, Tzi-Cker
    2016 INTERNATIONAL SYMPOSIUM ON VLSI TECHNOLOGY, SYSTEMS AND APPLICATION (VLSI-TSA), 2016,