Exploiting Common Subexpressions for Cloud Query Processing

被引:21
|
作者
Silva, Yasin N. [1 ]
Larson, Per-Ake [2 ]
Zhou, Jingren [3 ]
机构
[1] Arizona State Univ, Glendale, AZ 85306 USA
[2] Microsoft Res, Redmond, WA 98052 USA
[3] Microsoft Corp, Redmond, WA 98052 USA
关键词
EFFICIENT;
D O I
10.1109/ICDE.2012.106
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many companies now routinely run massive data analysis jobs - expressed in some scripting language - on large clusters of low-end servers. Many analysis scripts are complex and contain common subexpressions, that is, intermediate results that are subsequently joined and aggregated in multiple different ways. Applying conventional optimization techniques to such scripts will produce plans that execute a common subexpression multiple times, once for each consumer, which is clearly wasteful. Moreover, different consumers may have different physical requirements on the result: one consumer may want it partitioned on a column A and another one partitioned on column B. To find a truly optimal plan, the optimizer must trade off such conflicting requirements in a cost-based manner. In this paper we show how to extend a Cascade-style optimizer to correctly optimize scripts containing common subexpression. The approach has been prototyped in SCOPE, Microsoft's system for massive data analysis. Experimental analysis of both simple and large real-world scripts shows that the extended optimizer produces plans with 21 to 57% lower estimated costs.
引用
收藏
页码:1337 / 1348
页数:12
相关论文
共 50 条
  • [31] PIQL: Success-Tolerant Query Processing in the Cloud
    Armbrust, Michael
    Curtis, Kristal
    Kraska, Tim
    Fox, Armando
    Franklin, Michael J.
    Patterson, David A.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 5 (03): : 181 - 192
  • [32] Towards Cost-Optimal Query Processing in the Cloud
    Leis, Viktor
    Kuschewski, Maximilian
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (09): : 1606 - 1612
  • [33] Fast and Secure kNN Query Processing in Cloud Computing
    Lei, Xinyu
    Tu, Guan-Hua
    Liu, Alex X.
    Xie, Tian
    2020 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2020,
  • [34] Efficient In-Memory Point Cloud Query Processing
    Teuscher, Balthasar
    Geissendoerfer, Oliver
    Luo, Xuanshu
    Li, Hao
    Anders, Katharina
    Holst, Christoph
    Werner, Martin
    RECENT ADVANCES IN 3D GEOINFORMATION SCIENCE, 3D GEOINFO 2023, 2024, : 267 - 286
  • [35] Continuous Cloud-Scale Query Optimization and Processing
    Bruno, Nicolas
    Jain, Sapna
    Zhou, Jingren
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11): : 961 - 972
  • [36] Exploiting early sorting and early partitioning for decision support query processing
    J. Claussen
    A. Kemper
    D. Kossmann
    C. Wiesner
    The VLDB Journal, 2000, 9 : 190 - 213
  • [37] Exploiting early sorting and early partitioning for decision support query processing
    Claussen, J
    Kemper, A
    Kossmann, D
    Wiesner, C
    VLDB JOURNAL, 2000, 9 (03): : 190 - 213
  • [38] Distributed top-k query processing by exploiting skyline summaries
    Vlachou, Akrivi
    Doulkeridis, Christos
    Norvag, Kjetil
    DISTRIBUTED AND PARALLEL DATABASES, 2012, 30 (3-4) : 239 - 271
  • [39] Distributed top-k query processing by exploiting skyline summaries
    Akrivi Vlachou
    Christos Doulkeridis
    Kjetil Nørvåg
    Distributed and Parallel Databases, 2012, 30 : 239 - 271
  • [40] A higher-order strategy for eliminating common subexpressions
    Resler, R. Daniel
    Winter, Victor
    COMPUTER LANGUAGES SYSTEMS & STRUCTURES, 2009, 35 (04) : 341 - 364