Dynamic Deduplication Decision in a Hadoop Distributed File System

被引:2
|
作者
Chang, Ruay-Shiung [1 ]
Liao, Chih-Shan [1 ]
Fan, Kuo-Zheng [1 ]
Wu, Chia-Ming [1 ]
机构
[1] Natl Dong Hwa Univ, Dept Comp Sci & Informat Engn, Hualien 974, Taiwan
关键词
CODES;
D O I
10.1155/2014/630380
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data are generated and updated tremendously fast by users through any devices in anytime and anywhere in big data. Coping with these multiform data in real time is a heavy challenge. Hadoop distributed file system (HDFS) is designed to deal with data for building a distributed data center. HDFS uses the data duplicates to increase data reliability. However, data duplicates need a lot of extra storage space and funding in infrastructure. Using the deduplication technique can improve utilization of the storage space effectively. In this paper, we propose a dynamic deduplication decision to improve the storage utilization of a data center which uses HDFS as its file system. Our proposed system can formulate a proper deduplication strategy to sufficiently utilize the storage space under the limited storage devices. Our deduplication strategy deletes useless duplicates to increase the storage space. The experimental results show that our method can efficiently improve the storage utilization of a data center using the HDFS system.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Dynamic Preclusion of Encroachment in Hadoop Distributed File System
    Saranya, S.
    Sarumathi, M.
    Swathi, B.
    Paul, P. Victer
    Kumar, S. Sampath
    Vengattaraman, T.
    BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 : 531 - 536
  • [2] The Hadoop Distributed File System
    Shvachko, Konstantin
    Kuang, Hairong
    Radia, Sanjay
    Chansler, Robert
    2010 IEEE 26TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2010,
  • [3] Hadoop Distributed File System for the Grid
    Attebury, Garhan
    Baranovski, Andrew
    Bloom, Ken
    Bockelman, Brian
    Kcira, Dorian
    Letts, James
    Levshina, Tanya
    Lundestedt, Carl
    Martin, Terrence
    Maier, Will
    Pi, Haifeng
    Rana, Abhishek
    Sfiligoi, Igor
    Sim, Alexander
    Thomas, Michael
    Wuerthwein, Frank
    2009 IEEE NUCLEAR SCIENCE SYMPOSIUM CONFERENCE RECORD, VOLS 1-5, 2009, : 1056 - +
  • [4] Research on Distributed File System with Hadoop
    Xu, JunWu
    Liang, JunLing
    NETWORK COMPUTING AND INFORMATION SECURITY, 2012, 345 : 148 - +
  • [5] The Evolution of the Hadoop Distributed File System
    Maneas, Stathis
    Schroeder, Bianca
    2018 32ND INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (WAINA), 2018, : 67 - 74
  • [6] Dynamic core affinity for high-performance file upload on Hadoop Distributed File System
    Cho, Joong-Yeon
    Jin, Hyun-Wook
    Lee, Min
    Schwan, Karsten
    PARALLEL COMPUTING, 2014, 40 (10) : 722 - 737
  • [7] Data Security in Hadoop Distributed File System
    Shetty, Madhvaraj M.
    Manjaiah, D. H.
    IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGICAL TRENDS IN COMPUTING, COMMUNICATIONS AND ELECTRICAL ENGINEERING (ICETT), 2016,
  • [8] High Performance Hadoop Distributed File System
    Elkawkagy, Mohamed
    Elbeh, Heba
    INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2020, 8 (03) : 119 - 123
  • [9] Analytical Review on Hadoop Distributed File System
    Dwivedi, Kalpana
    Dubey, Sanjay Kumar
    2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 174 - 181
  • [10] Research on reliability of hadoop distributed file system
    Hu, Daming
    Chen, Deyun
    Lou, Shuhui
    Pei, Shujun
    International Journal of Multimedia and Ubiquitous Engineering, 2015, 10 (11): : 315 - 326