Mastiff: A MapReduce-based System for Time-based Big Data Analytics

被引:12
|
作者
Guo, Sijie [1 ]
Xiong, Jin [1 ]
Wang, Weiping [1 ]
Lee, Rubao [2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
中国国家自然科学基金;
关键词
time-based data analytics; MapReduce;
D O I
10.1109/CLUSTER.2012.10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Existing MapReduce-based warehousing systems are not specially optimized for time-based big data analysis applications. Such applications have two characteristics: 1) data are continuously generated and are required to be stored persistently for a long period of time; 2) applications usually process data in some time period so that typical queries use time-related predicates. Time-based big data analytics requires both high data loading speed and high query execution performance. However, existing systems including current MapReduce-based solutions do not solve this problem well because the two requirements are contradictory. We have implemented a MapReduce-based system, called Mastiff, which provides a solution to achieve both high data loading speed and high query performance. Mastiff exploits a systematic combination of a column group store structure and a lightweight helper structure. Furthermore, Mastiff uses an optimized table scan method and a column-based query execution engine to boost query performance. Based on extensive experiments results with diverse workloads, we will show that Mastiff can significantly outperform existing systems including Hive, HadoopDB, and GridSQL.
引用
收藏
页码:72 / 80
页数:9
相关论文
共 50 条
  • [31] MapReduce-based Parallel Algorithms for Multidimensionnal Data Analysis
    Pan, Jie
    Magoules, Frederic
    Le Biannic, Yann
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2012, 6 (02) : 325 - 350
  • [32] A MapReduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction
    Xia, Dawen
    Li, Huaqing
    Wang, Binfeng
    Li, Yantao
    Zhang, Zili
    IEEE ACCESS, 2016, 4 : 2920 - 2934
  • [33] A MapReduce-Based Big Spatial Data Framework for Solving the Problem of Covering a Polygon with Orthogonal Rectangles
    Eken, Suleyman
    Sayar, Ahmet
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2019, 26 (01): : 36 - 42
  • [34] Knowledge Extraction from Big Data using MapReduce-based Parallel-Reduct Algorithm
    Chowdhury, Tapan
    Chakraborty, Susanta
    Setua, S. K.
    PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2016, : 240 - 246
  • [35] CloudEC: A MapReduce-based Algorithm for Correcting Errors in Next-generation Sequencing Big Data
    Chung, Wei-Chun
    Ho, Jan-Ming
    Lin, Chung-Yen
    Lee, D. T.
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2836 - 2842
  • [36] MapReduce-based Capsule Networks
    Park, Sun Jin
    Park, Ho-Hyun
    2019 SIXTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2019, : 99 - 101
  • [37] The MapReduce-based approach to improve vehicle controls on big traffic events
    Hamilton Adoni, Wilfried Yves
    Nahhal, Tarik
    Aghezzaf, Brahim
    Elbyed, Abdeltif
    2017 INTERNATIONAL COLLOQUIUM ON LOGISTICS AND SUPPLY CHAIN MANAGEMENT (LOGISTIQUA), 2017, : 1 - 6
  • [38] MapReduce-based Image Processing System with Automated Parallelization
    Sozykin, A. V.
    Goldshtein, M. L.
    BULLETIN OF THE SOUTH URAL STATE UNIVERSITY SERIES-MATHEMATICAL MODELLING PROGRAMMING & COMPUTER SOFTWARE, 2012, (13): : 109 - 118
  • [39] A MapReduce-Based Distributed SVM for Scalable Data Type Classification
    Jiang, Chong
    Wu, Ting
    Xu, Jian
    Zheng, Ning
    Xu, Ming
    Yang, Tao
    COLLABORATE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2016, 2017, 201 : 115 - 126
  • [40] Enhancing in-memory efficiency for MapReduce-based data processing
    Veiga, Jorge
    Exposito, Roberto R.
    Taboada, Guillermo L.
    Tourino, Juan
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 120 : 323 - 338