INSTalytics: Cluster Filesystem Co-design for Big-data Analytics

被引:0
|
作者
Sivathanu, Muthian [1 ]
Vuppalapati, Midhul [1 ]
Gulavani, Bhargav S. [1 ]
Rajan, Kaushik [1 ]
Leeka, Jyoti [1 ]
Mohan, Jayashree [1 ,2 ]
Kedia, Piyus [1 ,3 ]
机构
[1] Microsoft Res India, Bengaluru, Karnataka, India
[2] Univ Texas Austin, Austin, TX 78712 USA
[3] IIIT Delhi, New Delhi, India
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present the design, implementation, and evaluation of INSTalytics a co-designed stack of a cluster file system and the compute layer, for efficient big data analytics in large-scale data centers. INSTalytics amplifies the well-known benefits of data partitioning in analytics systems; instead of traditional partitioning on one dimension, INSTalytics enables data to be simultaneously partitioned on four different dimensions at the same storage cost, enabling a larger fraction of queries to benefit from partition filtering and joins without network shuffle. To achieve this, INSTalytics uses compute-awareness to customize the 3-way replication that the cluster file system employs for availability. A new heterogeneous replication layout enables INSTalytics to preserve the same recovery cost and availability as traditional replication. INSTalytics also uses compute-awareness to expose a new sliced-read API that improves performance of joins by enabling multiple compute nodes to read slices of a data block efficiently via co-ordinated request scheduling and selective caching at the storage nodes. We have built a prototype implementation of INSTalytics in a production analytics stack, and show that recovery performance and availability is similar to physical replication, while providing significant improvements in query performance, suggesting a new approach to designing cloud-scale big-data analytics systems.
引用
收藏
页码:235 / 248
页数:14
相关论文
共 50 条
  • [31] Big-Data Mechanisms and Energy-Policy Design
    Pat, Ankit
    Larson, Kate
    Keshav, S.
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 3887 - 3893
  • [32] A Secure and Intelligent Framework for Vehicle Health Monitoring Exploiting Big-Data Analytics
    Rahman, Md Arafatur
    Rahim, Md Abdur
    Rahman, Md Mustafizur
    Moustafa, Nour
    Razzak, Imran
    Ahmad, Tanvir
    Patwary, Mohammad N.
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) : 19727 - 19742
  • [33] Cross-Platform Aviation Analytics Using Big-Data Integration Methods
    Larsen, Tulinda
    2013 INTEGRATED COMMUNICATIONS, NAVIGATION AND SURVEILLANCE CONFERENCE (ICNS), 2013,
  • [34] A cloud-based architecture for Big-Data Analytics in Smart Grid: A Proposal
    Mayilvaganan, M.
    Sabitha, M.
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 256 - 259
  • [35] Evolutionary Scheduling of Dynamic Multitasking Workloads for Big-Data Analytics in Elastic Cloud
    Zhang, Fan
    Cao, Junwei
    Tan, Wei
    Khan, Samee U.
    Li, Keqin
    Zomaya, Albert Y.
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (03) : 338 - 351
  • [36] Big-Data Analysis, Cluster Analysis, and Machine-Learning Approaches
    Alonso-Betanzos, Amparo
    Bolon-Canedo, Veronica
    SEX-SPECIFIC ANALYSIS OF CARDIOVASCULAR FUNCTION, 2018, 1065 : 607 - 626
  • [37] Wavelength-Selective Fog-Computing Network for Big-Data Analytics of Wireless Data
    Meyer, Michael Conrad
    Wang, Yu
    Watanabe, Takahiro
    2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 154 - 160
  • [38] Commuting inequity and its determinants in Shanghai: New findings from big-data analytics
    Zhao, Pengjun
    Cao, Yushu
    TRANSPORT POLICY, 2020, 92 : 20 - 37
  • [39] Opportunistic Physical Design for Big Data Analytics
    LeFevre, Jeff
    Sankaranarayanan, Jagan
    Hacigumus, Hakan
    Tatemura, Junichi
    Polyzotis, Neoklis
    Carey, Michael J.
    SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 851 - 862
  • [40] A big-data analytics method for capturing visitor activities and flows: the case of an island country
    Miah, Shah Jahan
    HuyQuan Vu
    Gammack, John
    INFORMATION TECHNOLOGY & MANAGEMENT, 2019, 20 (04): : 203 - 221