Dynamic Deduplication Decision in a Hadoop Distributed File System

被引:2
|
作者
Chang, Ruay-Shiung [1 ]
Liao, Chih-Shan [1 ]
Fan, Kuo-Zheng [1 ]
Wu, Chia-Ming [1 ]
机构
[1] Natl Dong Hwa Univ, Dept Comp Sci & Informat Engn, Hualien 974, Taiwan
关键词
CODES;
D O I
10.1155/2014/630380
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data are generated and updated tremendously fast by users through any devices in anytime and anywhere in big data. Coping with these multiform data in real time is a heavy challenge. Hadoop distributed file system (HDFS) is designed to deal with data for building a distributed data center. HDFS uses the data duplicates to increase data reliability. However, data duplicates need a lot of extra storage space and funding in infrastructure. Using the deduplication technique can improve utilization of the storage space effectively. In this paper, we propose a dynamic deduplication decision to improve the storage utilization of a data center which uses HDFS as its file system. Our proposed system can formulate a proper deduplication strategy to sufficiently utilize the storage space under the limited storage devices. Our deduplication strategy deletes useless duplicates to increase the storage space. The experimental results show that our method can efficiently improve the storage utilization of a data center using the HDFS system.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Dealing with Small Files Problem in Hadoop Distributed File System
    Bende, Sachin
    Shedge, Ashree
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING AND VIRTUALIZATION (ICCCV) 2016, 2016, 79 : 1001 - 1012
  • [32] Towards a Better Replica Management for Hadoop Distributed File System
    Ciritoglu, Hilmi Egemen
    Saber, Takfarinas
    Buda, Teodora Sandra
    Murphy, John
    Thorpe, Christina
    2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 104 - 111
  • [33] GDedup: Distributed File System Level Deduplication for Genomic Big Data
    Bartus, Paul
    Arzuaga, Emmanuel
    2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS), 2018, : 120 - 127
  • [34] LOAD REBALANCING FOR HADOOP DISTRIBUTED FILE SYSTEM USING DISTRIBUTED HASH TABLE
    Nithya, M.
    Maheshwari, N. Uma
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT SUSTAINABLE SYSTEMS (ICISS 2017), 2017, : 939 - 943
  • [35] A Distributed NameNode Cluster for a Highly-Available Hadoop Distributed File System
    Kim, Yonghwan
    Araragi, Tadashi
    Nakamura, Junya
    Masuzawa, Toshimitsu
    2014 IEEE 33RD INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS), 2014, : 333 - 334
  • [36] Optimization of Small Sized File Access Efficiency in Hadoop Distributed File System by Integrating Virtual File System Layer
    Alange, Neeta
    Mathur, Anjali
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (06) : 204 - 210
  • [37] Forensic Investigation Using RAM Analysis on the Hadoop Distributed File System
    Laing, Stuart
    Ludwiniak, Robert
    El Boudani, Brahim
    Chrysoulas, Christos
    Ubakanma, George
    Pitropakis, Nikolaos
    2023 19TH INTERNATIONAL CONFERENCE ON THE DESIGN OF RELIABLE COMMUNICATION NETWORKS, DRCN, 2023,
  • [38] Performance Evaluation and Tuning for MapReduce Computing in Hadoop Distributed File System
    Kim, Jongyeop
    Kumar, Ashwin T. K.
    George, K. M.
    Park, Nohpill
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2015, : 62 - 68
  • [39] A Distributed Cache for Hadoop Distributed File System in Real-time Cloud Services
    Zhang, Jing
    Wu, Gongqing
    Hu, Xuegang
    Wu, Xindong
    2012 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID), 2012, : 12 - 21
  • [40] An overall approach to achieve load balancing for Hadoop Distributed File System
    Lin, Chi-Yi
    Lin, Ying-Chen
    INTERNATIONAL JOURNAL OF WEB AND GRID SERVICES, 2017, 13 (04) : 448 - 466