INSTalytics: Cluster Filesystem Co-design for Big-data Analytics

被引:0
|
作者
Sivathanu, Muthian [1 ]
Vuppalapati, Midhul [1 ]
Gulavani, Bhargav S. [1 ]
Rajan, Kaushik [1 ]
Leeka, Jyoti [1 ]
Mohan, Jayashree [1 ,2 ]
Kedia, Piyus [1 ,3 ]
机构
[1] Microsoft Res India, Bengaluru, Karnataka, India
[2] Univ Texas Austin, Austin, TX 78712 USA
[3] IIIT Delhi, New Delhi, India
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present the design, implementation, and evaluation of INSTalytics a co-designed stack of a cluster file system and the compute layer, for efficient big data analytics in large-scale data centers. INSTalytics amplifies the well-known benefits of data partitioning in analytics systems; instead of traditional partitioning on one dimension, INSTalytics enables data to be simultaneously partitioned on four different dimensions at the same storage cost, enabling a larger fraction of queries to benefit from partition filtering and joins without network shuffle. To achieve this, INSTalytics uses compute-awareness to customize the 3-way replication that the cluster file system employs for availability. A new heterogeneous replication layout enables INSTalytics to preserve the same recovery cost and availability as traditional replication. INSTalytics also uses compute-awareness to expose a new sliced-read API that improves performance of joins by enabling multiple compute nodes to read slices of a data block efficiently via co-ordinated request scheduling and selective caching at the storage nodes. We have built a prototype implementation of INSTalytics in a production analytics stack, and show that recovery performance and availability is similar to physical replication, while providing significant improvements in query performance, suggesting a new approach to designing cloud-scale big-data analytics systems.
引用
收藏
页码:235 / 248
页数:14
相关论文
共 50 条
  • [21] Wireless Big-Data: Opportunity and the Design Challenging
    Ma, Jianguo
    Fu, Haipeng
    2016 IEEE MTT-S INTERNATIONAL CONFERENCE ON NUMERICAL ELECTROMAGNETIC AND MULTIPHYSICS MODELING AND OPTIMIZATION (NEMO), 2016,
  • [22] SIMD parallel MCMC sampling with applications for big-data Bayesian analytics
    Mahani, Alireza S.
    Sharabiani, Mansour T. A.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 88 : 75 - 99
  • [23] Comparative Evaluation of Big-Data Systems on Scientific Image Analytics Workloads
    Mehta, Parmita
    Dorkenwald, Sven
    Zhao, Dongfang
    Kaftan, Tomer
    Cheung, Alvin
    Balazinska, Magdalena
    Rokem, Ariel
    Connolly, Andrew
    Vanderplas, Jacob
    AlSayyad, Yusra
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (11): : 1226 - 1237
  • [24] CROSS-PLATFORM AVIATION ANALYTICS USING BIG-DATA METHODS
    Larsen, Tulinda
    2013 INTEGRATED COMMUNICATIONS, NAVIGATION AND SURVEILLANCE CONFERENCE (ICNS), 2013,
  • [25] Design of Algorithms for Big Data Analytics
    Bhatnagar, Raj
    BIG DATA ANALYTICS, BDA 2015, 2015, 9498 : 101 - 107
  • [26] Big-data analytics framework for incorporating smallholders in sustainable palm oil production
    Shukla, Manish
    Tiwari, Manoj Kumar
    PRODUCTION PLANNING & CONTROL, 2017, 28 (16) : 1365 - 1377
  • [27] Adopt Big-Data Analytics to Explore and Exploit the New Value for Service Innovation
    Thuethongchai, Nopsaran
    Taiphapoon, Tatri
    Chandrachai, Achara
    Triukose, Sipat
    SOCIAL SCIENCES-BASEL, 2020, 9 (03):
  • [28] ARI CAROLINE THIS BIG-DATA GURU MINES ANALYTICS TO HELP CANCER PATIENTS
    Nordrum, Amy
    IEEE SPECTRUM, 2016, 53 (05) : 23 - 23
  • [29] On-Line Big-Data Processing for Visual Analytics with Argus-Panoptes
    Vlantis, Panayiotis, I
    Delis, Alex
    ALGORITHMIC ASPECTS OF CLOUD COMPUTING (ALGOCLOUD 2018), 2019, 11409 : 102 - 117
  • [30] Development of a Semi-Synthetic Dataset as a Testbed for Big-Data Semantic Analytics
    Techentin, Robert
    Foti, Daniel
    Li, Peter
    Daniel, Erik
    Gilbert, Barry
    Holmes, David
    Al-Saffar, Sinan
    2014 IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2014, : 252 - +