Mastiff: A MapReduce-based System for Time-based Big Data Analytics

被引:12
|
作者
Guo, Sijie [1 ]
Xiong, Jin [1 ]
Wang, Weiping [1 ]
Lee, Rubao [2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
中国国家自然科学基金;
关键词
time-based data analytics; MapReduce;
D O I
10.1109/CLUSTER.2012.10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Existing MapReduce-based warehousing systems are not specially optimized for time-based big data analysis applications. Such applications have two characteristics: 1) data are continuously generated and are required to be stored persistently for a long period of time; 2) applications usually process data in some time period so that typical queries use time-related predicates. Time-based big data analytics requires both high data loading speed and high query execution performance. However, existing systems including current MapReduce-based solutions do not solve this problem well because the two requirements are contradictory. We have implemented a MapReduce-based system, called Mastiff, which provides a solution to achieve both high data loading speed and high query performance. Mastiff exploits a systematic combination of a column group store structure and a lightweight helper structure. Furthermore, Mastiff uses an optimized table scan method and a column-based query execution engine to boost query performance. Based on extensive experiments results with diverse workloads, we will show that Mastiff can significantly outperform existing systems including Hive, HadoopDB, and GridSQL.
引用
收藏
页码:72 / 80
页数:9
相关论文
共 50 条
  • [41] SHAHED: A MapReduce-based System for Querying and Visualizing Spatio-temporal Satellite Data
    Eldawy, Ahmed
    Mokbel, Mohamed F.
    Alharthi, Saif
    Alzaidy, Abdulhadi
    Tarek, Kareem
    Ghani, Sohaib
    2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 1585 - 1596
  • [42] Analysis of the Big Data based on MapReduce
    Tian, Zi-de
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING, 2015, 124 : 224 - 228
  • [43] The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis
    Huang, Shengsheng
    Huang, Jie
    Dai, Jinquan
    Xie, Tao
    Huang, Bo
    2010 IEEE 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDE 2010), 2010, : 41 - 51
  • [44] A scalable MapReduce-based design of an unsupervised entity resolution system
    Hagan, Nicholas Kofi Akortia
    Talburt, John R.
    Anderson, Kris E.
    Hagan, Deasia
    FRONTIERS IN BIG DATA, 2024, 7
  • [45] Tri-training and MapReduce-based massive data learning
    Guo, Mao-Zu
    Deng, Chao
    Liu, Yang
    Li, Ping
    INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2011, 40 (04) : 355 - 380
  • [46] The HiBench Benchmark Suite: Characterization of the MapReduce-Based Data Analysis
    Huang, Shengsheng
    Huang, Jie
    Dai, Jinquan
    Xie, Tao
    Huang, Bo
    NEW FRONTIERS IN INFORMATION AND SOFTWARE AS SERVICES: SERVICE AND APPLICATION DESIGN CHALLENGES IN THE CLOUD, 2011, 74 : 209 - 228
  • [47] MapReduce-Based Warehouse Systems: A Survey
    Sureshrao, Gore Sumit
    Ambulgekar, H. P.
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ENGINEERING AND TECHNOLOGY RESEARCH (ICAETR), 2014,
  • [48] An efficient MapReduce-based rule matching method for production system
    Li, Ying
    Liu, Weiwei
    Cao, Bin
    Yin, Jianwei
    Yao, Min
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 54 : 478 - 489
  • [49] A MapReduce-based improvement algorithm for DBSCAN
    Hu, Xiaojuan
    Liu, Lei
    Qiu, Ningjia
    Yang, Di
    Li, Meng
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2018, 12 (01) : 53 - 61
  • [50] A MapReduce-based Algorithm for Motif Search
    Huo, Hongwei
    Lin, Shuai
    Yu, Qiang
    Zhang, Yipu
    Stojkovic, Vojislav
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2052 - 2060