Mastiff: A MapReduce-based System for Time-based Big Data Analytics

被引:12
|
作者
Guo, Sijie [1 ]
Xiong, Jin [1 ]
Wang, Weiping [1 ]
Lee, Rubao [2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
中国国家自然科学基金;
关键词
time-based data analytics; MapReduce;
D O I
10.1109/CLUSTER.2012.10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Existing MapReduce-based warehousing systems are not specially optimized for time-based big data analysis applications. Such applications have two characteristics: 1) data are continuously generated and are required to be stored persistently for a long period of time; 2) applications usually process data in some time period so that typical queries use time-related predicates. Time-based big data analytics requires both high data loading speed and high query execution performance. However, existing systems including current MapReduce-based solutions do not solve this problem well because the two requirements are contradictory. We have implemented a MapReduce-based system, called Mastiff, which provides a solution to achieve both high data loading speed and high query performance. Mastiff exploits a systematic combination of a column group store structure and a lightweight helper structure. Furthermore, Mastiff uses an optimized table scan method and a column-based query execution engine to boost query performance. Based on extensive experiments results with diverse workloads, we will show that Mastiff can significantly outperform existing systems including Hive, HadoopDB, and GridSQL.
引用
收藏
页码:72 / 80
页数:9
相关论文
共 50 条
  • [1] A MapReduce-Based ELM for Regression in Big Data
    Wu, B.
    Yan, T. H.
    Xu, X. S.
    He, B.
    Li, W. H.
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 164 - 173
  • [2] Atrak: a MapReduce-based data warehouse for big data
    Barkhordari, Mohammadhossein
    Niamanesh, Mahdi
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (10): : 4596 - 4610
  • [3] Atrak: a MapReduce-based data warehouse for big data
    Mohammadhossein Barkhordari
    Mahdi Niamanesh
    The Journal of Supercomputing, 2017, 73 : 4596 - 4610
  • [4] MapReduce-Based Complex Big Data Analytics over Uncertain and Imprecise Social Networks
    Braun, Peter
    Cuzzocrea, Alfredo
    Jiang, Fan
    Leung, Carson Kai-Sang
    Pazdor, Adam G. M.
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 130 - 145
  • [5] A MapReduce-based Fuzzy Associative Classifier for Big Data
    Ducange, Pietro
    Marcelloni, Francesco
    Segatori, Armando
    2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [6] Verifying Properties of MapReduce-Based Big Data Processing
    Zhang, Nan
    Wang, Meng
    Duan, Zhenhua
    Tian, Cong
    IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (01) : 321 - 338
  • [7] MapReduce-based storage and indexing for big health data
    Gayathiri, N. R.
    Natarajan, A. M.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (14):
  • [8] Big Data Analytics based on PANFIS MapReduce
    Za'in, Choiru
    Pratama, Mahardhika
    Lughofer, Edwin
    Ferdaus, Meftahul
    Cai, Qing
    Prasad, Mukesh
    INNS CONFERENCE ON BIG DATA AND DEEP LEARNING, 2018, 144 : 140 - 152
  • [9] An Accelerated MapReduce-Based K-prototypes for Big Data
    Ben HajKacem, Mohamed Aymen
    Ben N'cir, Chiheb-Eddine
    Essoussi, Nadia
    SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS (STAF 2016), 2016, 9946 : 13 - 25
  • [10] A MapReduce-based approach to social network big data mining
    Qi, Fuli
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2023, 23 (05) : 2535 - 2547