Efficient Big Data Processing in Hadoop MapReduce

被引:124
|
作者
Dittrich, Jens [1 ,2 ]
Quiane-Ruiz, Jorge-Arnulfo [1 ]
机构
[1] Saarland Univ, Informat Syst Grp, Saarbrucken, Germany
[2] Saarland Univ, Comp Sci Databases, Saarbrucken, Germany
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2012年 / 5卷 / 12期
关键词
26;
D O I
10.14778/2367502.2367562
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data processing engine for big data is Hadoop MapReduce. Early versions of Hadoop MapReduce suffered from severe performance problems. Today, this is becoming history. There are many techniques that can be used with Hadoop MapReduce jobs to boost performance by orders of magnitude. In this tutorial we teach such techniques. First, we will briefly familiarize the audience with Hadoop MapReduce and motivate its use for big data processing. Then, we will focus on different data management techniques, going from job optimization to physical data organization like data layouts and indexes. Throughout this tutorial, we will highlight the similarities and differences between Hadoop MapReduce and Parallel DBMS. Furthermore, we will point out unresolved research problems and open issues.
引用
收藏
页码:2014 / 2015
页数:2
相关论文
共 50 条
  • [21] Reduced Time Compression in Big Data Using MapReduce Approach and Hadoop
    K. Meena
    J. Sujatha
    Journal of Medical Systems, 2019, 43
  • [22] Big Data Analysis of Indian Premier League using Hadoop and MapReduce
    Paul, Rajdeep
    2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,
  • [23] Geospatial Hadoop (GS-Hadoop) An efficient MapReduce based engine for distributed processing of Shapefiles
    Abdul, Jhummarwala
    Alkathiri, Mazin
    Potdar, M. B.
    2016 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION, & AUTOMATION (ICACCA) (FALL), 2016, : 22 - 28
  • [24] Modeling and Analysis of Hadoop MapReduce Systems for Big Data Using Petri Nets
    Chiang, Dai-Lun
    Wang, Sheng-Kuan
    Wang, Yu-Ying
    Lin, Yi-Nan
    Hsieh, Tsang-Yen
    Yang, Cheng-Ying
    Shen, Victor R. L.
    Ho, Hung-Wei
    APPLIED ARTIFICIAL INTELLIGENCE, 2021, 35 (01) : 80 - 104
  • [25] Big-Data in Climate Change Models - A novel approach with Hadoop MapReduce
    Loaiza, Juan Manuel Carmona
    Giuliani, Graziano
    Fiameni, Giuseppe
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 45 - 50
  • [26] An Efficient Private FIM On Hadoop MapReduce
    Kenekar, Trupti V.
    Dani, A. R.
    2016 INTERNATIONAL CONFERENCE ON AUTOMATIC CONTROL AND DYNAMIC OPTIMIZATION TECHNIQUES (ICACDOT), 2016, : 72 - 76
  • [27] Hadoop Paradigm for Satellite Environmental Big Data Processing
    Semlali, Badr-Eddine Boudriki
    El Amrani, Chaker
    Ortiz, Guadalupe
    INTERNATIONAL JOURNAL OF AGRICULTURAL AND ENVIRONMENTAL INFORMATION SYSTEMS, 2020, 11 (01) : 23 - 47
  • [28] Big medical data processing system based on hadoop
    Liu, W.
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 181 - 181
  • [29] Hadoop-EDF: Large-scale Distributed Processing of Electrophysiological Signal Data in Hadoop MapReduce
    Wu, Yuanyuan
    Li, Xiaojin
    Liu, Jinze
    Cui, Licong
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2265 - 2271
  • [30] A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce
    Li, Zhenlong
    Hu, Fei
    Schnase, John L.
    Duffy, Daniel Q.
    Lee, Tsengdar
    Bowen, Michael K.
    Yang, Chaowei
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2017, 31 (01) : 17 - 35