Exploiting Common Subexpressions for Cloud Query Processing

被引:21
|
作者
Silva, Yasin N. [1 ]
Larson, Per-Ake [2 ]
Zhou, Jingren [3 ]
机构
[1] Arizona State Univ, Glendale, AZ 85306 USA
[2] Microsoft Res, Redmond, WA 98052 USA
[3] Microsoft Corp, Redmond, WA 98052 USA
关键词
EFFICIENT;
D O I
10.1109/ICDE.2012.106
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many companies now routinely run massive data analysis jobs - expressed in some scripting language - on large clusters of low-end servers. Many analysis scripts are complex and contain common subexpressions, that is, intermediate results that are subsequently joined and aggregated in multiple different ways. Applying conventional optimization techniques to such scripts will produce plans that execute a common subexpression multiple times, once for each consumer, which is clearly wasteful. Moreover, different consumers may have different physical requirements on the result: one consumer may want it partitioned on a column A and another one partitioned on column B. To find a truly optimal plan, the optimizer must trade off such conflicting requirements in a cost-based manner. In this paper we show how to extend a Cascade-style optimizer to correctly optimize scripts containing common subexpression. The approach has been prototyped in SCOPE, Microsoft's system for massive data analysis. Experimental analysis of both simple and large real-world scripts shows that the extended optimizer produces plans with 21 to 57% lower estimated costs.
引用
收藏
页码:1337 / 1348
页数:12
相关论文
共 50 条
  • [21] The String Similarity Query Processing in Cloud Computing System
    LiaoYuanLai
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (02): : 25 - 35
  • [22] An efficient query processing optimization based on ELM in the cloud
    Linlin Ding
    Junchang Xin
    Guoren Wang
    Neural Computing and Applications, 2016, 27 : 35 - 44
  • [23] Secure query processing and optimization in cloud environment: a review
    Divya, V. L.
    Job, P. A.
    Preetha, Mathew K.
    INFORMATION SECURITY JOURNAL, 2024, 33 (02): : 172 - 191
  • [24] Non-Intrusive Elastic Query Processing in the Cloud
    Ticiana L. Coelho da Silva
    Mario A. Nascimento
    José Antônio F. de Macêdo
    Flávio R. C. Sousa
    Javam C. Machado
    Journal of Computer Science and Technology, 2013, 28 : 932 - 947
  • [25] Non-Intrusive Elastic Query Processing in the Cloud
    Ticiana L.Coelho da Silva
    Mario A.Nascimento
    Jos Antnio F.de Macêdo
    Fl′avio R.C.Sousa
    Javam C.Machado
    Journal of Computer Science & Technology, 2013, 28 (06) : 932 - 947
  • [26] Dynamic spatial index for efficient query processing on the cloud
    Kamel, Ibrahim
    Talha, Ayesha M.
    Al Aghbari, Zaher
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2017, 6
  • [27] An efficient query processing optimization based on ELM in the cloud
    Ding, Linlin
    Xin, Junchang
    Wang, Guoren
    NEURAL COMPUTING & APPLICATIONS, 2016, 27 (01): : 35 - 44
  • [28] Non-Intrusive Elastic Query Processing in the Cloud
    Coelho da Silva, Ticiana L.
    Nascimento, Mario A.
    de Macedo, Jose Antonio F.
    Sousa, Flavio R. C.
    Machado, Javam C.
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2013, 28 (06) : 932 - 947
  • [29] SKYLINE QUERY PROCESSING FOR INCOMPLETE DATA IN CLOUD ENVIRONMENT
    Gulzar, Yonis
    Alwan, Ali A.
    Salleh, Norsaremah
    Al-Shaikhli, Imad Fakhri
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON COMPUTING AND INFORMATICS: EMBRACING ECO-FRIENDLY COMPUTING, 2017, : 567 - 576
  • [30] Facilitating Secure Query Processing on Encrypted Databases on the Cloud
    Ben Omran, Osama M.
    Panda, Brajendra
    2016 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD), 2016, : 307 - 312