Exploiting Common Subexpressions for Cloud Query Processing

被引:21
|
作者
Silva, Yasin N. [1 ]
Larson, Per-Ake [2 ]
Zhou, Jingren [3 ]
机构
[1] Arizona State Univ, Glendale, AZ 85306 USA
[2] Microsoft Res, Redmond, WA 98052 USA
[3] Microsoft Corp, Redmond, WA 98052 USA
关键词
EFFICIENT;
D O I
10.1109/ICDE.2012.106
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many companies now routinely run massive data analysis jobs - expressed in some scripting language - on large clusters of low-end servers. Many analysis scripts are complex and contain common subexpressions, that is, intermediate results that are subsequently joined and aggregated in multiple different ways. Applying conventional optimization techniques to such scripts will produce plans that execute a common subexpression multiple times, once for each consumer, which is clearly wasteful. Moreover, different consumers may have different physical requirements on the result: one consumer may want it partitioned on a column A and another one partitioned on column B. To find a truly optimal plan, the optimizer must trade off such conflicting requirements in a cost-based manner. In this paper we show how to extend a Cascade-style optimizer to correctly optimize scripts containing common subexpression. The approach has been prototyped in SCOPE, Microsoft's system for massive data analysis. Experimental analysis of both simple and large real-world scripts shows that the extended optimizer produces plans with 21 to 57% lower estimated costs.
引用
收藏
页码:1337 / 1348
页数:12
相关论文
共 50 条
  • [41] Program-algebraic approach to eliminating common subexpressions
    Boyle, James M.
    Resler, R.Daniel
    Informatica (Ljubljana), 2000, 24 (03) : 397 - 408
  • [42] Secure Query Processing with Data Interoperability in a Cloud Database Environment
    Wong, Wai Kit
    Kao, Ben
    Cheung, David Wai Lok
    Li, Rongbin
    Yiu, Siu Ming
    SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1395 - 1406
  • [43] Efficient SQL Adaptive Query Processing in Cloud Databases Systems
    Costa, Clayton Maciel
    Maia Leite, Cicilia Raquel
    Sousa, Antonio Luis
    PROCEEDINGS OF THE 2016 IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (EAIS), 2016, : 114 - 121
  • [44] An Enhanced Queries Scheduler for Query Processing over A Cloud Environment
    Maghawry, Eman A.
    Ismail, Rasha M.
    Badr, Nagwa L.
    Tolba, M. F.
    2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2014, : 409 - 414
  • [45] Efficient Path Query Processing Over Massive Trajectories on the Cloud
    Li, Ruiyuan
    Ruan, Sijie
    Bao, Jie
    Li, Yanhua
    Wu, Yingcai
    Hong, Liang
    Zheng, Yu
    IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (01) : 66 - 79
  • [46] Secure and Efficient Query Processing Technique for Encrypted Databases in Cloud
    Almakdi, Sultan
    Panda, Brajendra
    2019 2ND INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS 2019), 2019, : 120 - 127
  • [47] A workload-driven approach to database query processing in the cloud
    Adnene Guabtni
    Rajiv Ranjan
    Fethi A. Rabhi
    The Journal of Supercomputing, 2013, 63 : 722 - 736
  • [48] A workload-driven approach to database query processing in the cloud
    Guabtni, Adnene
    Ranjan, Rajiv
    Rabhi, Fethi A.
    JOURNAL OF SUPERCOMPUTING, 2013, 63 (03): : 722 - 736
  • [49] An Efficient Framework for Secure Dynamic Skyline Query Processing in the Cloud
    Chen, Peng
    Xu, Baochao
    Li, Hui
    Wang, Weiguo
    Peng, Yanguo
    Bhowmick, Sourav S.
    Chen, Xiaofeng
    Cui, Jiangtao
    DATA SCIENCE AND ENGINEERING, 2025, 10 (01) : 54 - 74
  • [50] The Research of Relational Database Query Processing Based on Cloud Platform
    Gu, Wei
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ELECTRONIC INDUSTRY AND AUTOMATION (EIA 2017), 2017, 145 : 95 - 98