Exploiting Common Subexpressions for Cloud Query Processing

被引:21
|
作者
Silva, Yasin N. [1 ]
Larson, Per-Ake [2 ]
Zhou, Jingren [3 ]
机构
[1] Arizona State Univ, Glendale, AZ 85306 USA
[2] Microsoft Res, Redmond, WA 98052 USA
[3] Microsoft Corp, Redmond, WA 98052 USA
关键词
EFFICIENT;
D O I
10.1109/ICDE.2012.106
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many companies now routinely run massive data analysis jobs - expressed in some scripting language - on large clusters of low-end servers. Many analysis scripts are complex and contain common subexpressions, that is, intermediate results that are subsequently joined and aggregated in multiple different ways. Applying conventional optimization techniques to such scripts will produce plans that execute a common subexpression multiple times, once for each consumer, which is clearly wasteful. Moreover, different consumers may have different physical requirements on the result: one consumer may want it partitioned on a column A and another one partitioned on column B. To find a truly optimal plan, the optimizer must trade off such conflicting requirements in a cost-based manner. In this paper we show how to extend a Cascade-style optimizer to correctly optimize scripts containing common subexpression. The approach has been prototyped in SCOPE, Microsoft's system for massive data analysis. Experimental analysis of both simple and large real-world scripts shows that the extended optimizer produces plans with 21 to 57% lower estimated costs.
引用
收藏
页码:1337 / 1348
页数:12
相关论文
共 50 条
  • [1] Exploiting Common Subexpressions in Numerical CSPs
    Araya, Ignacio
    Neveu, Bertrand
    Trombettoni, Gilles
    PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, 2008, 5202 : 342 - 357
  • [2] Detecting common subexpressions for multiple query optimization over loosely-coupled heterogeneous data sources
    Chaudhari, Mahesh B.
    Dietrich, Suzanne W.
    DISTRIBUTED AND PARALLEL DATABASES, 2016, 34 (02) : 119 - 143
  • [3] Exploiting correlated attributes in acquisitional query processing
    Deshpande, A
    Guestrin, C
    Hong, W
    Madden, S
    ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 143 - 154
  • [4] A new algorithm for elimination of common subexpressions
    Pasko, R
    Schaumont, P
    Derudder, V
    Vernalde, S
    Duracková, D
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1999, 18 (01) : 58 - 68
  • [5] SECURE QUERY PROCESSING in CLOUD NoSQL
    Ahmadian, Mohammad
    2017 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2017,
  • [6] (A)kNN Query Processing on the Cloud: A Survey
    Nodarakis, Nikolaos
    Rapti, Angeliki
    Sioutas, Spyros
    Tsakalidis, Athanasios K.
    Tsolis, Dimitrios
    Tzimas, Giannis
    Panagis, Yannis
    ALGORITHMIC ASPECTS OF CLOUD COMPUTING, ALGOCLOUD 2016, 2017, 10230 : 26 - 40
  • [7] Unique Topic Query Processing On Cloud
    Liu, Lvhong
    Yang, Zhihui
    He, Zhenying
    Jing, Yinan
    Wang, Xiaoyang Sean
    2018 5TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (IEEE CSCLOUD 2018) / 2018 4TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (IEEE EDGECOM 2018), 2018, : 103 - 105
  • [8] Elasticity in Cloud Databases and Their Query Processing
    Graefe, Goetz
    Nica, Anisoara
    Stolze, Knut
    Neumann, Thomas
    Eavis, Todd
    Petrov, Ilia
    Pourabbas, Elaheh
    Fekete, David
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2013, 9 (02) : 1 - 20
  • [9] Detecting common subexpressions for multiple query optimization over loosely-coupled heterogeneous data sources
    Mahesh B. Chaudhari
    Suzanne W. Dietrich
    Distributed and Parallel Databases, 2016, 34 : 119 - 143
  • [10] Exploiting Embedded Synopsis for Exact and Approximate Query Processing
    Yuasa, Hiroki
    Goda, Kazuo
    Kitsuregawa, Masaru
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT II, 2022, 13427 : 235 - 240