Mastiff: A MapReduce-based System for Time-based Big Data Analytics

被引:12
|
作者
Guo, Sijie [1 ]
Xiong, Jin [1 ]
Wang, Weiping [1 ]
Lee, Rubao [2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
中国国家自然科学基金;
关键词
time-based data analytics; MapReduce;
D O I
10.1109/CLUSTER.2012.10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Existing MapReduce-based warehousing systems are not specially optimized for time-based big data analysis applications. Such applications have two characteristics: 1) data are continuously generated and are required to be stored persistently for a long period of time; 2) applications usually process data in some time period so that typical queries use time-related predicates. Time-based big data analytics requires both high data loading speed and high query execution performance. However, existing systems including current MapReduce-based solutions do not solve this problem well because the two requirements are contradictory. We have implemented a MapReduce-based system, called Mastiff, which provides a solution to achieve both high data loading speed and high query performance. Mastiff exploits a systematic combination of a column group store structure and a lightweight helper structure. Furthermore, Mastiff uses an optimized table scan method and a column-based query execution engine to boost query performance. Based on extensive experiments results with diverse workloads, we will show that Mastiff can significantly outperform existing systems including Hive, HadoopDB, and GridSQL.
引用
收藏
页码:72 / 80
页数:9
相关论文
共 50 条
  • [21] A MapReduce-based k-Nearest Neighbor Approach for Big Data Classification
    Maillo, Jesus
    Triguero, Isaac
    Herrera, Francisco
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2, 2015, : 167 - 172
  • [22] Knowledge process of health big data using MapReduce-based associative mining
    So-Young Choi
    Kyungyong Chung
    Personal and Ubiquitous Computing, 2020, 24 : 571 - 581
  • [23] A Demonstration of Shahed: A MapReduce-based System for Querying and Visualizing Satellite Data
    Eldawy, Ahmed
    Alharthi, Saif
    Alzaidy, Abdulhadi
    Daghistani, Anas
    Ghani, Sohaib
    Basalamah, Saleh
    Mokbel, Mohamed F.
    2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 1444 - 1447
  • [24] A Hadoop/MapReduce based platform for supporting health big data analytics
    Kuo A.
    Chrimes D.
    Qin P.
    Zamani H.
    Studies in Health Technology and Informatics, 2019, 257 : 229 - 235
  • [25] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
    Jiang, Hai
    Chen, Yi
    Qiao, Zhi
    Weng, Tien-Hsiung
    Li, Kuan-Ching
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (01): : 369 - 383
  • [26] Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
    Sardar T.H.
    Ansari Z.
    Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 73 - 82
  • [27] MapReduce-Based D_ELT Framework to Address the Challenges of Geospatial Big Data
    Jo, Junghee
    Lee, Kang-Woo
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (11)
  • [28] Scaling up MapReduce-based Big Data Processing on Multi-GPU systems
    Hai Jiang
    Yi Chen
    Zhi Qiao
    Tien-Hsiung Weng
    Kuan-Ching Li
    Cluster Computing, 2015, 18 : 369 - 383
  • [29] MapReduce-based parallel GEP algorithm for efficient function mining in big data applications
    Liu, Yang
    Ma, Chenxiao
    Xu, Lixiong
    Shen, Xiaodong
    Li, Maozhen
    Li, Pengcheng
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [30] Gaussian relevance vector MapReduce-based annealed Glowworm optimization for big medical data scheduling
    Patan, Rizwan
    Kallam, Suresh
    Gandomi, Amir H.
    Hanne, Thomas
    Ramachandran, Manikandan
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2022, 73 (10) : 2204 - 2215