A BigData MapReduce Hadoop Distribution Architecture for Processing Input Splits to solve the Small Data Problem

被引:0
|
作者
Manjunath, R. [1 ]
Tejus [1 ]
Channabasava, R. K. [1 ]
Balaji, S. [2 ]
机构
[1] City Engn Coll, Dept CSE, Hyderabad, Andhra Pradesh, India
[2] Jain Univ, Bengaluru, India
来源
PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT) | 2016年
关键词
Hadoop; MapReduce; input splits;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hadoop deals with big data which is an open source java framework. There are two core components in it namely: HDFS (Hadoop distributed file system) is the ability of a system to continue normal operation against hardware or software faults using inexpensive hardware and which stocks huge extent of data another one is MapReduce is a processing technique and programming model done in lateral and scattered manner. Hadoop does not perform well for short data because huge amount of short data could be greater task on the NameNode of HDFS which inturn its execution time is prolonged for which MapReduce is encountered. While dealing with great amount of short data as it is particularly designed to handle huge amount of data, hadoop experienced with a performance cost. This analysis permits the indetail description of HDFS, actual ways to deal with the problems along with proposed approach to handle short data files and short data file problems. In proposed approach, small files are merged using programming model on hadoop known as MapReduce. By this approach of Hadoop performance of handling small files which is larger than block size is improved. We also propose a Traffic analyzer with the combination of Hadoop and Map-Reduce paradigm. The joint of Hadoop and MapReduce programming tools makes it possible to provide batch analysis in minimum response time and in memory computing capacity in order to process log in a high available, efficient and stable way.
引用
收藏
页码:480 / 487
页数:8
相关论文
共 34 条
  • [21] An improved integrated Grid and MapReduce-Hadoop architecture for spatial data: Hilbert TGS R-Tree-based IGSIM
    Singh, Hari
    Bawa, Seema
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (17):
  • [22] IRPDP_HT2: a scalable data pre-processing method in web usage mining using Hadoop MapReduce
    Srivastava, Atul Kumar
    Srivastava, Mitali
    SOFT COMPUTING, 2023, 27 (12) : 7907 - 7923
  • [23] IRPDP_HT2: a scalable data pre-processing method in web usage mining using Hadoop MapReduce
    Atul Kumar Srivastava
    Mitali Srivastava
    Soft Computing, 2023, 27 : 7907 - 7923
  • [24] Data Mining Processing Based on Problem-oriented Machine Architecture
    Tatur, Mikhail
    Adzines, Dzmitry
    Seitkulov, Yerzhan
    Lukashevich, Marina
    2015 INTERNATIONAL CONFERENCE ON INFORMATION AND DIGITAL TECHNOLOGIES (IDT), 2015, : 372 - 375
  • [25] Architecture Design of A Data Intensive Satellite Image Processing and Distribution System
    Zong, Ziliang
    Romoser, Brian
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 777 - 781
  • [26] Opening a website at which one can get a benchmark input data set to solve the set covering problem
    Iwamura, K
    Okada, N
    Deguchi, Y
    Proceedings of the Second International Conference on Information and Management Sciences, 2002, 2 : 178 - 180
  • [27] Opening a web site at which one can get a benchmark input data set to solve the set covering problem
    Iwamura, Kakuzo
    Okada, Norio
    Deguchi, Yozo
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2005, 8 (03): : 395 - 401
  • [28] Hadoop Small Image Processing Technology Based on Big Data Processing and Its Application Effect in Face Feature Extraction and Face Recognition System Design
    Zhang, Yidi
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [29] Add-Vit: CNN-Transformer Hybrid Architecture for Small Data Paradigm Processing
    Chen, Jinhui
    Wu, Peng
    Zhang, Xiaoming
    Xu, Renjie
    Liang, Jia
    NEURAL PROCESSING LETTERS, 2024, 56 (03)
  • [30] Two-Stage Data Distribution for Distributed Surveillance Video Processing with Hybrid Storage Architecture
    Gao, Yangyang
    Zhang, Haitao
    Tang, Bingchang
    Zhu, Yanpei
    Ma, Huadong
    2017 IEEE 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2017, : 616 - 623