Experiences with Approximating Queries in Microsoft's Production Big-Data Clusters

被引:7
|
作者
Kandula, Srikanth [1 ]
Lee, Kukjin [1 ]
Chaudhuri, Surajit [1 ]
Friedman, Marc [1 ]
机构
[1] Microsoft, Redmond, WA 98052 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2019年 / 12卷 / 12期
关键词
D O I
10.14778/3352063.3352130
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapidly growing volume of data, it is more attractive than ever to leverage approximations to answer analytic queries. Sampling is a powerful technique which has been studied extensively from the point of view of facilitating approximation. Yet, there has been no large-scale study of effectiveness of sampling techniques in big data systems. In this paper, we describe an in-depth study of the sampling-based approximation techniques that we have deployed in Microsoft's big data clusters. We explain the choices we made to implement approximation, identify the usage cases, and study detailed data that sheds insight on the usefulness of doing sampling based approximation.
引用
收藏
页码:2131 / 2142
页数:12
相关论文
共 45 条
  • [1] Optimizing Big-Data Queries Using Program Synthesis
    Schlaipfer, Matthias
    Rajan, Kaushik
    Lal, Akash
    Samak, Malavika
    PROCEEDINGS OF THE TWENTY-SIXTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES (SOSP '17), 2017, : 631 - 646
  • [2] Pushing Data-Induced Predicates Through Joins in Big-Data Clusters
    Kandula, Srikanth
    Orr, Laurel
    Chaudhuri, Surajit
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 13 (03): : 252 - 265
  • [3] MULTI-MODAL BIG-DATA MANAGEMENT FOR FILM PRODUCTION
    Kim, Hansung
    Pabst, Simon
    Sneddon, Justin
    Waine, Ted
    Clifford, Jeff
    Hilton, Adrian
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 4833 - 4837
  • [4] Analysis of production cycle-time distribution with a big-data approach
    Tan, Xu
    Xing, Lining
    Cai, Zhaoquan
    Wang, Gaige
    JOURNAL OF INTELLIGENT MANUFACTURING, 2020, 31 (08) : 1889 - 1897
  • [5] Analysis of production cycle-time distribution with a big-data approach
    Xu Tan
    Lining Xing
    Zhaoquan Cai
    Gaige Wang
    Journal of Intelligent Manufacturing, 2020, 31 : 1889 - 1897
  • [6] Generalized Sub-Query Fusion for Eliminating Redundant I/O from Big-Data Queries
    Sarthi, Partho
    Rajan, Kaushik
    Lal, Akash
    Modi, Abhishek
    Jain, Prakhar
    Liu, Mo
    Gosalia, Ashit
    Kalikar, Saurabh
    PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), 2020, : 209 - 224
  • [7] Big-data analytics framework for incorporating smallholders in sustainable palm oil production
    Shukla, Manish
    Tiwari, Manoj Kumar
    PRODUCTION PLANNING & CONTROL, 2017, 28 (16) : 1365 - 1377
  • [8] COVERAGE EVALUATION OF CAMERA NETWORKS FOR FACILITATING BIG-DATA MANAGEMENT IN FILM PRODUCTION
    Imre, Evren
    Hilton, Adrian
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 3710 - 3714
  • [9] Catching Failures of Failures at Big-Data Clusters: A Two-Level Neural Network Approach
    Rosa, Andrea
    Chen, Lydia Y.
    Binder, Walter
    2015 IEEE 23RD INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2015, : 231 - 236
  • [10] Amazon, Google, Microsoft Join NSF's Big Data Program
    Waurzyniak, Patrick
    MANUFACTURING ENGINEERING, 2017, 158 (06): : 41 - 41