Dynamic Deduplication Decision in a Hadoop Distributed File System

被引:2
|
作者
Chang, Ruay-Shiung [1 ]
Liao, Chih-Shan [1 ]
Fan, Kuo-Zheng [1 ]
Wu, Chia-Ming [1 ]
机构
[1] Natl Dong Hwa Univ, Dept Comp Sci & Informat Engn, Hualien 974, Taiwan
关键词
CODES;
D O I
10.1155/2014/630380
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data are generated and updated tremendously fast by users through any devices in anytime and anywhere in big data. Coping with these multiform data in real time is a heavy challenge. Hadoop distributed file system (HDFS) is designed to deal with data for building a distributed data center. HDFS uses the data duplicates to increase data reliability. However, data duplicates need a lot of extra storage space and funding in infrastructure. Using the deduplication technique can improve utilization of the storage space effectively. In this paper, we propose a dynamic deduplication decision to improve the storage utilization of a data center which uses HDFS as its file system. Our proposed system can formulate a proper deduplication strategy to sufficiently utilize the storage space under the limited storage devices. Our deduplication strategy deletes useless duplicates to increase the storage space. The experimental results show that our method can efficiently improve the storage utilization of a data center using the HDFS system.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Research of the Strategy to Improve the Hadoop Distributed File System Downloading Performance
    Ge Junwei
    Jiang Renxiang
    ELECTRONIC INFORMATION AND ELECTRICAL ENGINEERING, 2012, 19 : 244 - 247
  • [42] Data-Intensive Workload Consolidation for the Hadoop Distributed File System
    Moraveji, Reza
    Taheri, Javid
    Reza, Mohammad
    Rizvandi, Nikzad Babaii
    Zomaya, Albert Y.
    2012 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID), 2012, : 95 - 103
  • [43] Enhancing UNICORE Storage Management Using Hadoop Distributed File System
    Bari, Wasim
    Memon, Ahmed Shiraz
    Schuller, Bernd
    EURO-PAR 2009 PARALLEL PROCESSING WORKSHOPS, 2010, 6043 : 345 - +
  • [44] Performance Evaluation of Read and Write Operations in Hadoop Distributed File System
    Krishna, T. Lakshmi Siva Rama
    Ragunathan, T.
    Battula, Sudheer Kumar
    2014 SIXTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2014, : 110 - 113
  • [45] Zput: a speedy data uploading approach for the Hadoop Distributed File System
    Wang, Youwei
    Wang, Weiping
    Ma, Can
    Meng, Dan
    2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
  • [46] Forensic Investigation Using RAM Analysis on the Hadoop Distributed File System
    Laing, Stuart
    Ludwiniak, Robert
    Boudani, Brahim El
    Chrysoulas, Christos
    Ubakanma, George
    Pitropakis, Nikolaos
    2023 19th International Conference on the Design of Reliable Communication Networks, DRCN 2023, 2023,
  • [47] A Distributed and Cooperative NameNode Cluster for a Highly-Available Hadoop Distributed File System
    Kim, Yonghwan
    Araragi, Tadashi
    Nakamura, Junya
    Masuzawa, Toshimitsu
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (04) : 835 - 851
  • [48] Storage Service Reliability and Availability Predictions of Hadoop Distributed File System
    Chattaraj, Durbadal
    Bhagat, Sumit
    Sarma, Monalisa
    RELIABILITY, SAFETY AND HAZARD ASSESSMENT FOR RISK-BASED TECHNOLOGIES, 2020, : 617 - 626
  • [49] Analysis of DNA Data Using Hadoop Distributed File System.
    Senthilkumar, M.
    Ilango, P.
    RESEARCH JOURNAL OF PHARMACEUTICAL BIOLOGICAL AND CHEMICAL SCIENCES, 2016, 7 (03): : 796 - 803
  • [50] SD-HDFS: Secure Deletion in Hadoop Distributed File System
    Agrawal, Bikash
    Hansen, Raymond
    Rong, Chunming
    Wiktorski, Tomasz
    2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 181 - 189