Continuous Cloud-Scale Query Optimization and Processing

被引:27
|
作者
Bruno, Nicolas [1 ]
Jain, Sapna [2 ]
Zhou, Jingren [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98008 USA
[2] Indian Inst Technol, Bombay, Maharashtra, India
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2013年 / 6卷 / 11期
关键词
D O I
10.14778/2536222.2536223
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Massive data analysis in cloud-scale data centers plays a crucial role in making critical business decisions. Highlevel scripting languages free developers from understanding various system trade-offs, but introduce new challenges for query optimization. One key optimization challenge is missing accurate data statistics, typically due to massive data volumes and their distributed nature, complex computation logic, and frequent usage of user-defined functions. In this paper we propose novel techniques to adapt query processing in the Scope system, the cloud-scale computation environment in Microsoft Online Services. We continuously monitor query execution, collect actual runtime statistics, and adapt parallel execution plans as the query executes. We discuss similarities and differences between our approach and alternatives proposed in the context of traditional centralized systems. Experiments on large-scale Scope production clusters show that the proposed techniques systematically solve the challenge of missing/inaccurate data statistics, detect and resolve partition skew and plan structure, and improve query latency by a few folds for real workloads. Although we focus on optimizing high-level languages, the same ideas are also applicable for MapReduce systems.
引用
收藏
页码:961 / 972
页数:12
相关论文
共 50 条
  • [41] Diagnosis of cirrus cloud occurrence using large-scale analysis data and a cloud-scale model
    Cautenet, G
    Gbe, D
    ANNALES GEOPHYSICAE-ATMOSPHERES HYDROSPHERES AND SPACE SCIENCES, 1996, 14 (07): : 753 - 766
  • [42] Query Optimization for Cloud Database
    Jahi, Niraja
    Raghu, B.
    Khanna, V
    2021 6TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2021,
  • [43] SECURE QUERY PROCESSING in CLOUD NoSQL
    Ahmadian, Mohammad
    2017 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2017,
  • [44] (A)kNN Query Processing on the Cloud: A Survey
    Nodarakis, Nikolaos
    Rapti, Angeliki
    Sioutas, Spyros
    Tsakalidis, Athanasios K.
    Tsolis, Dimitrios
    Tzimas, Giannis
    Panagis, Yannis
    ALGORITHMIC ASPECTS OF CLOUD COMPUTING, ALGOCLOUD 2016, 2017, 10230 : 26 - 40
  • [45] Unique Topic Query Processing On Cloud
    Liu, Lvhong
    Yang, Zhihui
    He, Zhenying
    Jing, Yinan
    Wang, Xiaoyang Sean
    2018 5TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (IEEE CSCLOUD 2018) / 2018 4TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (IEEE EDGECOM 2018), 2018, : 103 - 105
  • [46] Elasticity in Cloud Databases and Their Query Processing
    Graefe, Goetz
    Nica, Anisoara
    Stolze, Knut
    Neumann, Thomas
    Eavis, Todd
    Petrov, Ilia
    Pourabbas, Elaheh
    Fekete, David
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2013, 9 (02) : 1 - 20
  • [47] Cloud-scale model intercomparison of chemical constituent transport in deep convection
    Barth, M. C.
    Kim, S.-W.
    Wang, C.
    Pickering, K. E.
    Ott, L. E.
    Stenchikov, G.
    Leriche, M.
    Cautenet, S.
    Pinty, J.-P.
    Barthe, Ch.
    Mari, C.
    Helsdon, J. H.
    Farley, R. D.
    Fridlind, A. M.
    Ackerman, A. S.
    Spiridonov, V.
    Telenta, B.
    ATMOSPHERIC CHEMISTRY AND PHYSICS, 2007, 7 (18) : 4709 - 4731
  • [48] The role of cloud-scale resolution on radiative properties of oceanic cumulus clouds
    Kassianov, E
    Ackerman, T
    Kollias, P
    JOURNAL OF QUANTITATIVE SPECTROSCOPY & RADIATIVE TRANSFER, 2005, 91 (02): : 211 - 226
  • [49] A Configurable Cloud-Scale DNN Processor for Real-Time AI
    Fowers, Jeremy
    Ovtcharov, Kalin
    Papamichael, Michael
    Massengill, Todd
    Liu, Ming
    Lo, Daniel
    Alkalay, Shlomi
    Haselman, Michael
    Adams, Logan
    Ghandi, Mahdi
    Heil, Stephen
    Patel, Prerak
    Sapek, Adam
    Weisz, Gabriel
    Woods, Lisa
    Lanka, Sitaram
    Reinhardt, Steven K.
    Caulfield, Adrian M.
    Chung, Eric S.
    Burger, Doug
    2018 ACM/IEEE 45TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2018, : 1 - 14
  • [50] QUEST: Search-driven Management of Cloud-Scale Data Centers
    Maiti, Atreyee
    Singh, Rahul
    Chandra, Ramesh
    Shukla, Himanshu
    2017 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2017), 2017, : 175 - 182